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to  the  Kullback-Leibler  divergence,  and  that  the  least  favorable  densities 
are  those  that  minimize  this  quantity.  The  robust  quickest  detector  is 
also  determined  for  the  weak  signal  case,  and  we  show  an  equivalence  between 
the  performance  measure,  the  classical  efficacy,  and  Fisher’s  information. 
Performance  curves  are  given  to  show  the  gain  available  when  robustness  is 
built  into  the  procedure. 

The  robust  quickest  detector  is  also  derived  under  mean  and  covariance 
uncertainty  for  a  multivariate  Gaussian  noise  process.  It  is  shown  that 
the  robust  processor  is  exactly  the  robust  discrete-time  matched  filter, 
which  has  been  studied  previously.  Expressions  for  the  asymptotic 
performance  are  derived,  and  particular  solutions  are  presented  for  several 
uncertainty  classes.  Performance  curves  are  provided  to  illustrate  the 
tradeoffs  when  there  is  a  mismatch  between  the  assumed  and  actual  levels  of 
uncertainty.  The  applicability  of  the  robust  procedure  to  non-Gaussian  noise 
is  also  discussed. 

Next,  quickest  detection  procedures  for  the  fusion  processor  of  a 
distributed  detection  system  are  investigated.  An  optimal  procedure  is 
derived  and  compared  to  several  alternative  methods  which  are  easier  to 
implement  in  that  they  are  recursive  and  require  less  computation.  A  simple 
method  for  choosing  the  thresholds  of  the  local  detectors  is  given,  and  a 
sensitivity  analysis  reveals  that  this  choice  results  in  overall  system 
performance  that  is  close  to  optimal.  Lastly,  performance  curves  are 
presented  which  illustrate  the  tradeoff  between  performance  gain  and  channel 
bandwidth. 

Finally,  an  adaptive  procedure  is  proposed  which  is  suitable  for  the 
disorder  problem  when  a  jump  of  unknown  magnitude  occurs  in  the  mean  of  a 
random  process.  It  is  shown  that  this  test  exhibits  asymptotic  performance 
that  is  similar  to  the  test  which  is  optimal  for  known  jump  magnitude.  The 
adaptive  procedure  is  implemented  to  detect  a  change  in  the  rate  parameter 
of  a  Poisson  process. 
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Chapter  1 


Introduction 


1.1  Motivation 

This  dissertation  focuses  on  sequential  techniques  for  detecting  a  change,  or  disorder , 
in  the  statistics  of  a  random  process.  A  disorder  can  be  as  simple  as  a  shift  in  the 
mean  from  one  constant  to  another,  or  as  complex  as  a  sudden  change  in  the  dynamic 
profile  of  multiple  parameters.  In  either  case,  the  overall  goal  is  to  determine  as  soon 
as  possible  that  the  change  occurred,  while  at  the  same  time  minimizing  the  chance 
of  falsely  signalling  the  occurrence  of  a  disorder  in  the  absence  of  a  change.  In  other 
words,  we  are  seeking  quickest  detection  procedures. 

Many  signal  processing  techniques  assume  that  the  parameters  that  characterize 
the  data  are  either  stationary  or  only  slowly  time- varying.  However,  there  are  numer¬ 
ous  situations  where  this  assumption  does  not  hold.  In  such  cases,  quickest  detection 
procedures  can  be  used  to  signal  the  change  so  that  some  corrective  action  can  be 
taken.  Any  area  in  which  abrupt  changes  in  the  nature  of  a  signal  occur  can  poten¬ 
tially  benefit  from  the  use  of  quickest  detection  procedures.  Many  such  examples  can 
be  found  in  the  recent  book  by  Basseville  and  Nikiforov  [1]. 

The  selection  of  a  procedure  for  disorder  detection  is  largely  dependent  on  the 
particular  application,  as  well  as  the  amount  of  a  priori  information  about  the  data. 
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In  this  thesis,  three  types  of  quickest  detection  problems  are  investigated:  robust 
techniques  which  are  suitable  when  the  noise  distributions  are  only  partially  known, 
quickest  detection  procedures  designed  for  the  fusion  processor  of  a  distributed  de¬ 
tection  system,  and  an  adaptive  procedure  suitable  for  the  case  when  the  disorder  is 
a  jump  in  the  mean  of  unknown  magnitude.  In  each  case,  we  are  especially  interested 
in  seeking  procedures  that  can  be  implemented  recursively,  making  them  suitable  for 
on-line  use. 


1.2  Thesis  Content 

The  body  of  this  dissertation  is  divided  into  five  chapters.  Chapter  2  lays  the  founda¬ 
tion  for  the  remainder  of  the  thesis,  as  much  of  the  notation  and  definitions  are  used 
in  subsequent  chapters.  The  disorder  problem  is  presented  formally,  and  previous 
work  that  is  central  to  the  field  is  reviewed.  Many  of  the  results  of  this  work  are 
represented  by  various  performance  curves,  obtained  either  by  direct  computation  or 
via  Monte  Carlo  methods;  the  algorithms  used  to  generate  these  plots  are  presented 
here. 

Chapter  3  begins  a  study  of  robust  quickest  detectors.  In  many  cases  where 
quickest  detection  techniques  would  be  desirable,  the  underlying  statistical  model  may 
not  be  precisely  known.  Simply  modelling  the  noise  as  Gaussian  in  this  situation  may 
result  in  the  following  problems:  (1)  the  actual  false  alarm  rate  may  differ  significantly 
from  the  desired  value,  and  (2)  detectability  may  be  sacrificed  by  simply  increasing 
the  decision  threshold.  In  order  to  alleviate  these  problems,  we  derive  the  minimax 
robust  detector  based  upon  a  lower  bound  on  the  asymptotic  performance  of  Page’s 
test.  The  robust  procedure  is  derived  for  the  epsilon-contaminated  and  total  variation 
classes,  both  of  which  are  useful  in  modelling  real-world  uncertainty.  The  performance 
of  the  robust  procedure  is  compared  to  several  nonparametric  versions  of  Page’s  test, 
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which  were  studied  in  detail  in  [2].  The  minimax  robust  procedure  is  also  derived  for 
the  small-signal  case.  It  is  demonstrated  that  the  robust  procedure  results  in  good 
performance  over  a  wide  range  of  noise  distributions. 

In  Chapter  4,  we  consider  the  minimax  robust  quickest  detection  problem  when 
the  noise  distribution  is  multidimensional  Gaussian.  It  is  shown  that  this  problem 
is  closely  related  to  the  previous  work  of  Verdu  and  Poor  [3]  on  minimax  robust 
matched  filtering,  and  that  the  solution  to  the  latter  problem  can  be  used  to  solve 
the  former.  The  robust  procedure  is  derived  for  both  signal  and  noise  uncertainty, 
where  the  uncertainty  is  modelled  as  the  deviation  from  some  nominal  parameters. 
The  application  of  the  robust  quickest  detector  in  multivariate  non- Gaussian  noise  is 
also  discussed. 

In  Chapter  5,  we  study  the  problem  of  determining  as  quickly  as  possible  the 
occurrence  of  a  disorder  in  a  decentralized  decision  environment.  Here,  a  number  of 
sensors  are  used  to  monitor  some  phenomenon.  The  decision  as  to  the  presence  or 
absence  of  a  disorder  is  made  at  a  central  processor,  or  fusion  center,  based  upon  a 
summarized  version  of  the  sensor  data.  The  processor  receives  a  set  of  local  binary 
decisions  at  regular  intervals,  where  each  decision  indicates  either  “disorder  present” 
or  “no  disorder  present.”  The  optimal  procedure  in  the  maximum  likelihood  sense  is 
derived  for  this  problem.  For  each  set  of  local  decisions,  the  fusion  center  must  perform 
a  search  over  all  possible  disorder  times,  a  task  which  could  become  prohibitive  when 
the  local  decisions  are  based  on  a  large  number  of  samples.  It  is  shown  that  a 
small  simplification  can  be  made  to  eliminate  the  need  for  this  search;  this  yields 
a  suboptimal  procedure  which,  nevertheless,  exhibits  performance  nearly  identical  to 
the  optimal  version. 

In  perhaps  the  most  significant  contribution  of  this  chapter,  we  propose  a  new 
simple  method  for  choosing  the  thresholds  of  the  local  detectors  based  upon  a  lower 
bound  on  the  asymptotic  performance  measure,  which  we  derive  for  the  distributed 
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detection  problem.  Direct  optimization  of  this  performance  would  require  the  solution 
of  a  set  of  constrained  nonlinear  equations.  By  comparison,  the  optimization  of  the 
lower  bound  on  asymptotic  performance  is  easy,  requiring  little  computation.  A 
sensitivity  analysis  reveals  that  the  new  method  results  in  overall  system  performance 
which  is  close  to  optimal;  this  is  particularly  true  when  the  false  alarm  rate  is  low, 
a  condition  which  is  desirable  in  many  realistic  scenarios.  Each  local  decision  is 
generated  based  upon  a  set  of  sequential  sensor  samples.  In  general,  as  the  number  of 
samples  per  local  decision  increases,  both  the  required  channel  bandwidth  to  transmit 
the  local  decisions  and  the  relative  performance  of  the  overall  procedure  decrease.  We 
conclude  the  chapter  by  assessing  this  tradeoff.  While  perhaps  contrary  to  intuition, 
it  is  shown  that  for  the  weak  signal  case,  sending  the  local  decision  as  frequently  as 
possible  does  not  result  in  the  best  performance. 

In  Chapter  6,  we  investigate  the  disorder  problem  when  a  jump  of  unknown  mag¬ 
nitude  occurs  in  the  mean  of  a  random  process.  An  adaptive  procedure  is  proposed 
that  consists  of  two  stages  which  operate  sequentially:  the  first  is  a  version  of  Page’s 
test  designed  for  a  jump  of  minimum  magnitude;  the  second  is  an  adaptive  version 
of  the  classical  Wald  sequential  probability  ratio  test.  The  rationale  behind  such 
a  test  lies  in  the  difficulty  of  reliably  estimating  the  pre-  or  post-disorder  means  in 
the  vicinity  of  the  disorder  time.  For  example,  an  estimate  of  the  pre-disorder  mean 
could  likely  become  corrupted  from  samples  from  the  post-disorder  hypothesis,  since 
the  disorder  time  is  unknown.  The  two-stage  procedure  provides  a  means  to  separate 
the  two  hypotheses  (with  some  probability  of  error)  so  that  the  estimate  of  the  mean 
after  the  disorder  will  be  more  reliable.  It  is  shown  that  the  adaptive  test  has  similar 
asymptotic  performance  to  the  test  which  is  optimal  for  known  jump  size.  It  also 
has  the  advantage  of  being  recursive,  more  easily  lending  itself  to  on-line  implemen¬ 
tations.  The  procedure  is  implemented  to  detect  a  change  in  the  rate  parameter  of  a 
Poisson  process.  However,  it  is  also  applicable  to  other  distributions. 
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Finally,  the  original  contributions  of  this  thesis  are  reviewed  in  Chapter  7. 
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Chapter  2 


Quickest  Detection:  A  Review 


The  main  body  of  this  dissertation,  Chapters  3  through  6,  focuses  on  various  problems 
that  fall  into  the  general  category  of  quickest  detection.  The  purpose  of  this  chapter  is 
to  review  some  of  the  previous  work  in  this  area,  as  well  as  to  introduce  the  definitions 
and  notation  that  will  be  used  throughout  this  thesis.  Thus,  this  chapter  will  serve 
as  a  major  reference  for  each  of  the  subsequent  chapters. 

In  Section  2.1,  the  disorder  problem  is  first  presented  in  a  very  general  sense,  and 
the  goal  of  quickest  detection  is  stated.  Some  of  the  assumptions  that  will  be  made 
throughout  this  thesis  are  also  given.  Section  2.2  introduces  a  procedure  for  on-line 
disorder  detection  known  as  Page’s  test;  the  optimality  of  this  test  is  also  discussed. 
In  Section  2.3,  we  define  the  asymptotic  performance  measure  for  Page’s  test.  This 
measure  is  a  very  useful  quantity,  and  is  a  starting  point  for  many  of  the  results 
in  this  thesis.  Section  2.4  presents  several  additional  methods  that  may  be  used  to 
compute  the  performance  of  quickest  detection  procedures.  Finally,  in  Section  2.5, 
several  extensions  of  quickest  detection  procedures  to  more  complicated  models  are 
presented. 

Much  of  the  material  in  this  chapter  can  be  found  in  Chapter  2  of  the  Ph.D.  thesis 
of  Broder  [5]  and  in  the  recent  book  by  Basseville  and  Nikiforov  [2]. 
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2.1  The  Disorder  Problem 

Consider  a  sequence  of  random  variables  where  random  variable  X{  has 

conditional  density  f(Xi  \  9\  X\ l”1),  X{  —  X\7 . . .  7Xj]  here,  9  is  some  scalar  or  vector 
quantity  that  parameterizes  the  conditional  density.  Suppose  that  9  =  60  for  i  = 

1, . . . ,  m  —  1,  and  9  =  91  for  i  —  m,  m  +  1, _  In  other  words,  the  random  variables 

undergo  a  disorder  at  time  instant  m,  which  is  called  the  disorder  time .  The  goal  is 
to  detect  the  change  as  soon  as  possible.  1  Thus,  one  wishes  to  detect  a  shift  from 
hypothesis  Ho  to  hypothesis  H\7  where 

H0  :  Xi~f{Xi\$oiXrl) 

H,  :  Xi  ~  f(Xi  |  0i ;  Xff1) 

The  above  problem  is  phrased  in  terms  of  an  on-line  framework:  the  samples  are 
received  sequentially,  and  a  decision  regarding  the  occurrence  of  a  disorder  is  made 
at  each  sample  time.  2 

The  change  detection  problem  can  alternatively  be  formulated  using  an  off-line 
approach,  where  the  decision  is  based  on  a  finite  “block”  of  samples  X1,X2, . . . ,  Xn. 
The  problem  here  is  to  determine  which  of  the  hypotheses 

Ho  :  Xi~f(Xi \6o\Xi~1),  for  i  =  1, . . .  ,n 

Hx  :  Xi~f(Xi  \0olXt1),  for  i  =  — 1 

Xi  ~  f(Xi  I  6\ ;  Xff1),  for  i  =  m,...,n 

holds.  The  off-line  problem  can  be  useful  in  situations  where  either  the  size,  n,  of 

the  data  window  is  small,  or  otherwise  where  there  is  ample  computing  power  and 

memory  for  data  storage.  However,  in  most  engineering  applications,  one  is  interested 

1In  this  case,  the  disorder  is  a  jump  change  in  8 ,  but  more  general  types  of  disorders  may  also 
be  considered;  some  examples  will  be  given  in  Section  2.5. 

2The  hypotheses  can  be  written  more  generally  as  a  change  in  the  distribution  F(Xi  | 

In  this  work,  it  is  assumed  throughout  that  the  density  functions  exist,  and  so  all  of  the  expressions 
will  be  written  in  terms  of  /(•  |  •)• 
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in  procedures  that  require  little  memory  and  may  be  implemented  sequentially.  In 
addition,  it  is  sometimes  the  case  that  the  disorder  can  be  reliably  detected  using 
fewer  than  the  n  samples  contained  in  the  fixed  block  of  data.  Various  techniques  for 
off-line  disorder  detection  are  covered  in  [9].  However,  in  this  thesis,  we  will  consider 
only  the  on-line  problem. 

The  following  assumptions  will  be  made  throughout  the  thesis: 

•  The  disorder  time  m  is  unknown. 

Two  approaches  are  typically  used  in  modelling  the  disorder  time:  the  Bayesian 
approach,  where  m  is  modelled  as  a  random  variable,  and  the  maximum  likeli¬ 
hood  (ML)  approach,  where  m  is  taken  to  be  unknown.  The  Bayesian  approach 
was  first  investigated  in  [18],  and  is  based  on  the  assumption  that  the  prior 
probability  of  the  disorder  time  is  known.  On  the  other  hand,  the  ML  approach 
is  more  realistic  when  little  is  known  about  the  disorder  time,  such  as  in  situa¬ 
tions  where  the  waiting  time  before  the  disorder  occurs  is  potentially  very  long. 
Examples  include  radar  warning  systems,  where  the  threat  (e.g.,  a  missile)  sud¬ 
denly  appears  over  the  horizon,  and  communication  link  monitoring  [16],  where 
the  channel  characteristics  may  change  suddenly  due  to  some  defect.  In  such 
instances,  it  may  not  be  possible  to  accurately  characterize  the  distribution  of 
m. 

•  The  observations  are  independent 

In  this  case,  the  on-line  hypotheses  become 

H0  :  Xi  ~  f(Xi  |  0o)  =  f0(Xi) 

:  Xi  ~  f(Xi  |  0i)  =  fi(Xi) 

This  assumption  is  made  to  simplify  the  problem,  and  will  result  in  simpler 
algorithms.  Examples  of  applications  of  quickest  detection  on  uncorrelated  data 
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include  the  detection  of  failures  in  linear  systems  via  the  monitoring  of  the 
innovations  process  [21],  and  the  detection  of  changes  in  the  drift  in  systems 
that  can  be  modelled  as  a  stochastic  differential  equation  of  the  following  form: 

dXt  =  6tdt  +  dWt 


where 

Q  =  |  *0,  t  <  to 
\  01,  t  >  to 

and  {Wt}  is  a  Weiner  process  [22].  3  The  latter  can  be  used  to  model  radar 
return,  where  the  change  in  drift  occurs  when  a  target  emerges.  Other  work 
investigates  the  problem  of  detecting  disorders  when  the  data  is  correlated.  Ex¬ 
amples  where  such  problems  arise  are  given  in  Section  2.5. 

•  The  disorder  is  a  jump  change  in  the  mean 

We  consider  the  case  where  the  parameter  6  is  simply  the  mean  of  the  process, 
and  that  6  undergoes  a  one-time  positive  jump  from  60  to  6\  >  6 o-  Not  only 
does  this  assumption  simplify  the  problem,  but  it  is  also  a  reasonable  model  for 
a  large  number  of  physical  systems  of  interest.  For  example,  the  sudden  failure 
of  a  device  in  a  system  may  lead  to  a  step  change  in  the  output.  Also,  sudden 
changes  in  spectral  energy  can  be  detected  by  testing  for  jumps  in  the  coefficients 
of  the  energy  spectral  density  [5].  Some  examples  involving  more  complicated 
changes  are  given  in  Section  2.5. 


3 Xt  is  uncorrelated  when  8t  is  scalar.  For  the  vector  case,  the  observables  will  likely  be  correlated. 
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2.2  Page’s  Test 

In  1954,  E.  S.  Page  [15]  introduced  the  following  sequential  procedure  for  detecting  a 
shift  from  Ho  to  H\.  Define  the  cumulative  sum  (CUSUM)  statistic 

If  =  X>(*)  (2-1) 

where  g(x)  =  log  with  the  convention  that  Tj  =  0  if  j  >  k.  An  alarm  sounds 
(i.e.,  a  disorder  is  declared)  when  the  stopping  time  N  occurs,  where 

N  =  inf  \n  \  T?  —  min  T*  >  h)  (2.2) 

l  0<k<n  J 

and  h  >  0  is  some  threshold.  This  procedure  is  commonly  known  as  Page’s  test. 
Intuitively,  one  can  see  that  this  procedure  terminates  when  the  difference  of  the 
cumulative  sum  and  its  past  minimum  exceeds  the  threshold.  It  is  easy  to  verify  that 
when  g(')  is  the  log-likelihood  ratio,  then  E[g(x )  j  H0]  <  0  <  E [g(x)  |  Hi].  4  Therefore, 
T”  is  seen  to  have  a  drift  which  is  negative  before  the  disorder  and  positive  afterwards, 
and  Page’s  procedure  reacts  to  this  change  in  drift.  An  example  illustrating  this  point 
is  shown  in  the  upper  plot  of  Figure  2.1. 

Now  consider  the  off-line  version  of  the  same  problem  discussed  in  Section  2.1.  In 

particular,  recall  the  definition  of  the  hypotheses  H0  and  Hi.  The  ML  procedure  for 

the  off-line  problem  is  to  declare  a  disorder  when  the  maximum  of  the  log-likelihood 

ratio  between  H0  and  Hi  over  all  possible  disorder  times  exceeds  a  threshold;  in  other 

4This  is  done  by  using  the  relationships  1  —  ~  <  logsc  <  x  —  13  which  are  sometimes  referred  to 
as  the  “IT  inequalities”.  We  have 

E[ff(*)  |  Ho]  =J  log  j^f0(x)dx  <  J  -  l)  f0(x)  =  0 

and 

Efo(«)  I  Hi]  =  J  log  HI h(x)dx  >  /  (l  -  HI)  /i(*)  =  0 

Equality  holds  only  in  the  degenerate  case  where  /o(»)  =  fi(x ). 
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Figure  2.1:  Two  versions  of  Page’s  test. 


words,  declare  a  disorder  in  case 

/W  I  Hr) 


max  log  — — =  max 

l<m<n  f(Xi  |  Hq)  l<m<n  i 


x>  ,m>k 


(2.3) 


MXi) 

It  is  not  difficult  to  see  that  a  sequential  implementation  of  the  above  test,  where  a 
disorder  is  declared  when  the  stopping  time 

N  =  inf  <  n  |  max  £  s(Xi)  >  h  } 

l  —  —  1=771  / 


occurs,  is  equivalent  to  the  procedure  in  (2.2),  where  again  g  is  the  log-likelihood 
ratio.  However,  notice  that  with  the  off-line  procedure,  all  past  samples  must  be 
available  at  each  iteration,  while  the  sequential  test  only  requires  the  storage  of  the 
past  minimum  of  T”.  Thus,  (2.2)  can  be  interpreted  as  the  on-line  version  of  the  ML 
procedure  for  detecting  the  change. 
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In  [15],  it  is  shown  that  an  equivalent  version  of  (2.2)  is  given  by 

N  =  inf  {n  \  Sn  >  h}  (2-4) 

where  Sn  is  generated  by  the  recursion 

Sn  =  max  {Sn- 1  4-  g{X. 0}  (2.5) 

This  version  of  Page’s  test  will  be  used  exclusively  throughout  the  thesis;  it  is  illus¬ 
trated  in  the  lower  plot  of  Figure  2.1.  Notice  that  Sn  and  T"  are  exactly  the  same 
after  the  disorder  time,  and  that  both  procedures  react  when  the  upward  drift  ex¬ 
ceeds  the  threshold  h.  Like  (2.2),  this  procedure  is  recursive  and  suitable  for  on-line 
applications.  This  version  also  has  the  advantage  that  the  test  statistic  Sn  always  lies 
in  the  the  interval  [0,  h].  On  the  other  hand,  T”  can  potentially  become  very  large 
in  magnitude  if  the  disorder  occurs  only  after  a  long  time;  this  could  cause  roundoff 
errors  if  it  were  necessary  to  quantize  the  samples  using  only  a  few  bits.  Also,  observe 
that  the  test  statistic  in  (2.5)  can  be  interpreted  as  a  repeated  sequential  probability 
ratio  test  (SPRT)  with  the  continuation  region  [0,  h ]  in  the  following  sense:  if  Sn  <  0 
(i.e.,  the  lower  boundary  is  crossed),  the  statistic  is  reset  to  zero  and  a  new  SPRT 
commences;  if  the  upper  boundary  h  is  crossed,  the  test  terminates  and  a  disorder  is 
declared.  It  will  be  shown  later  that  this  interpretation  can  be  useful  in  computing 
the  performance  of  Page’s  test.  5 

It  turns  out  that  Page’s  test  implemented  with  g(x)  =  log  not  only  can  be 
interpreted  as  a  recursive  ML  procedure,  but  it  is  in  fact  the  optimal  procedure  as 
explained  below.  Let  N  denote  the  stopping  time  of  any  procedure  designed  to  detect 
the  disorder.  Define  the  following  quantities: 

T  =  E0N 

5 In  some  cases  the  disorder  is  two-sided,  such  as  a  shift  in  the  mean  which  may  be  either  positive 
or  negative.  The  approach  here  is  to  implement  two  Page’s  tests  in  parallel,  one  for  each  possible 
change  direction,  and  declare  a  disorder  when  an  alarm  sounds  in  either  test.  In  general,  K  parallel 
Page’s  tests  may  be  used  whenever  there  are  K  alternative  hypotheses,  ,  k  =  1, . . K, 
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D  =  sup  ess  sup  Efc  (N  —  k  +  1)+  |  X\, . . . ,  Xk-i  =  Ei N 

k>l  L 

where  E0  is  the  expectation  under  6  =  do,  and  E*,  for  k  >  1,  is  the  expectation  under 
the  distribution  of  the  observations  when  the  change  from  60  to  6\  occurs  at  time  k. 
T  is  called  the  mean  time  between  false  alarms  (MFA),  and  D  is  the  worst  expected 
delay  in  detecting  the  disorder. 

Let  N  denote  the  stopping  time  of  Page’s  procedure  using  the  log-likelihood.  In 
[13],  Lorden  derives  two  key  asymptotic  results: 

(7Z1)  Select  the  threshold  h  in  (2.4)  such  that 

E0iV(7)  >  7 


Then  as  7  — »  00 ; 


E  ^'(7) 


1  w'  m,90) 

The  first  result  characterizes  the  asymptotic  performance  of  (2.4)-(2.5)  as  the  (MFA) 
becomes  large,  a  condition  usually  desired  in  real  situations,  since  one  would  like  the 
false  alarms  to  be  as  infrequent  as  possible.  The  second  result  shows  that  no  other 
test  is  asymptotically  better  than  that  in  (2.4)-(2.5).  In  fact,  it  was  later  shown  in 
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[14]  and  [17]  that  this  procedure  is  also  optimal  in  the  non-asymptotic  sense;  that  is, 
D  is  minimized  for  any  fixed  T . 

Thus  far,  we  have  focused  on  the  case  where  g(-)  is  the  log- likelihood  ratio,  which 
results  in  the  optimal  version  of  Page’s  test.  In  some  cases,  though,  the  actual  den¬ 
sities  of  the  observations  are  not  known,  so  the  exact  form  of  the  log-likelihood  ratio 
is  not  known  and  the  optimal  test  cannot  be  implemented  (for  example,  the  exact 
values  of  80  and  might  be  unknown).  Therefore,  it  is  also  useful  to  consider  the 
more  general  version  of  Page’s  test  where  g(- )  is  arbitrary. 

In  [5],  nonparametric  versions  of  Page’s  test  are  considered  for  the  case  where  g(-) 
is  the  sign  detector 


g(x) 


* 

—1,  x  <  0 
1,  x>0 


and  the  dead-zone  nonlinearity 


9(x)  = 


-1, 

0, 

1, 


x  <  —d 
[  <  d 
x  >  d 


where  d  >  0.  It  is  shown  that  these  nonparametric  quickest  detectors  are  useful 
in  cases  where  the  underlying  noise  distributions  are  heavy-tailed.  In  Chapters  3 
and  4  of  this  thesis,  robust  alternatives  for  quickest  detection  are  investigated.  Such 
techniques  are  useful  when  the  noise  is  only  partially  characterized,  and  the  goal  is 
then  to  maximize  the  worst  case  performance.  In  this  case,  the  nonlinearity  g(-)  is 
the  solution  of  a  minimax  problem,  and  it  turns  out  to  be  the  log-likelihood  of  the 
least  favorable  distributions. 

It  will  be  necessary  to  characterize  the  performance  of  Page’s  test  for  arbitrary 
g(-).  How  this  can  be  accomplished  is  the  subject  of  the  next  two  sections. 
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2.3  The  Asymptotic  Performance  Measure 

In  designing  a  quickest  detection  procedure,  one  is  interested  in  minimizing  D  for  any 
operating  point  T,  and  this  minimum  occurs  when  g  is  the  log- likelihood  ratio  between 
fo  and  fi.  Notice  also  that  when  this  is  the  case,  (721)  says  that  the  worst  expected 
delay  is  a  logarithmic  function  of  the  MFA  for  large  T.  Therefore  the  performance  of 
the  optimal  Page’s  test  can  be  asymptotically  characterized  by  the  quantity 

”  =  &  (2-6) 
This  quantity  is  called  the  asymptotic  performance  measure  of  Page’s  test,  and  (77.1) 
implies  that  for  the  optimal  Page  test,  77  =  /(/i,/o)-  6 

That  77  describes  the  asymptotic  performance  of  Page’s  test  can  be  seen  by  ob¬ 
serving  that  is  the  slope  of  the  plot  of  D  versus  logT,  as  T  — »  00.  In  particular, 
for  large  T,  we  have  the  approximation 

V 

and  so  to  minimize  D,  one  needs  to  maximize  77.  Note  that  77  is  not  a  function  of  the 
threshold  h,  since  the  limit  was  taken  as  h  — >  00;  this  means  that  we  have  eliminated 
a  variable  from  the  optimization  problem.  However,  one  must  make  sure  that  the 
desired  T  is  large  enough  so  that  (2.7)  is  valid.  Luckily,  in  most  practical  problems, 
one  is  interested  in  designing  procedures  with  few  false  alarms,  resulting  in  large  T. 

As  mentioned  previously,  Page’s  test  can  also  be  defined  for  arbitrary  g(x),  and 
so  it  would  also  be  useful  to  characterize  the  asymptotic  performance  for  this  case. 
Unfortunately,  it  is  not  clear  how  one  would  compute  77  for  the  generalized  Page  test, 
although  we  know  from  (77.2)  that  77  <  I(f\,  fo).  To  address  this  problem,  Broder  [5] 
showed  that  the  lower  bound  77  <  77  can  be  defined  as  follows: 

p  =  uj0R{g(x)  |/j}  (2.8) 

6  Alternatively,  the  limit  in  (2.6)  as  T  — >  oo  can  be  evaluated  as  either  &  — ►  oo  or  as  Z)  — >oo, 
since  any  one  of  these  implies  the  other  two. 


(2.7) 
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where  u>0  is  the  unique  non-zero  root  of  the  moment  generating  function  equality 

E  |  /„}  =  1 

An  asymptotic  upper  bound  on  D  can  now  be  obtained  as 

Dk  logT£  logT 
V  V 

Therefore,  the  upper  bound  on  the  worst  expected  delay  can  be  minimized  by  selecting 
fj  to  be  as  large  as  possible. 

It  is  shown  in  [5]  that  a  sufficient  condition  for  rj  to  be  maximized  (i.e.,  it  equals 
rj )  is  that  g  be  the  log-likelihood  ratio.  In  this  case,  u>o  =  1,  and  rj  directly  reduces 
to  the  Kullback-Leibler  divergence  as  expected.  It  is  also  shown  that  rj  is  invariant 
to  changes  in  scale;  thus,  fj  —  rj  when  g(x )  =  Clog  for  any  C  >  0.  In  Appendix 
A,  we  use  variational  calculus  techniques  to  show  the  converse  of  this  -  that  no  other 
choice  of  g(x )  will  make  rj  =  rj;  thus,  g(x)  =  C  log  is  also  a  necessary  condition. 

The  lower  bound  rj  is  useful  for  several  reasons.  First,  it  can  be  computed  for  any 
choice  of  g(-),  enabling  side-by-side  comparisons  of  different  tests.  Second,  as  will  be 
shown  in  later  chapters,  it  is  not  difficult  to  compute.  Finally,  it  enables  us  to  obtain 
an  upper  bound  on  D  for  any  (large)  fixed  T ;  thus,  as  will  be  seen  in  later  chapters, 
a  designer  can  use  rj  to  quickly  compute  the  approximate  performance  of  a  procedure 
which  uses  any  nonlinearity  g(-).  We  will  use  rj  often  in  the  next  four  chapters. 

2.4  Methods  of  Performance  Computation 

We  have  seen  that  the  performance  of  Page’s  test  is  characterized  by  the  pair  ( T,D ). 
Therefore,  a  natural  way  to  compare  several  procedures  is  to  compare  the  plots  of 
D  versus  T  for  each  one.  It  was  shown  in  the  previous  section  that  the  relationship 
between  T  and  D  can  be  approximated  via  the  computation  of  fj  as  shown  above.  We 
now  discuss  how  T  and  D  can  be  obtained  directly. 
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“Direct”  Computation 

Let  Nz{6 )  denote  the  average  sample  number  (ASN)  of  a  CUSUM  procedure  whose 
initial  score  is  z  (i.e.,  So  =  z ).  When  the  procedure  begins,  So  =  0,  and  so 

T  =  Af0{90) 

Also  notice  that  since  Sn  >  0,  Vn,  the  worst  mean  delay  corresponds  to  the  case 
where  Sm-i  =  0;  therefore, 

D  =  ^o(^i) 

As  stated  earlier,  Page’s  test  can  be  viewed  as  a  repeated  application  of  a  SPRT 
with  lower  boundary  0  and  upper  boundary  h: 

sn  =  Sn—i  9  ( Ain ) ,  So  z 

M  =  inf  {n  |  sn  £  [0,  h]} 

Also  define 

Mz{6)  =  E  [M  |  0] 

and 

Vz{9)  =  Pr{sM  <  0  |  0} 

M.z{6)  is  the  ASN  and  Vz{9)  is  the  operating  characteristic  of  the  SPRT,  which  is  the 
probability  that  the  SPRT  will  terminate  at  the  lower  boundary.  It  is  not  difficult  to 
show  that  [15]: 

V.m  =  ^ 

and  so  T  and  D  can  be  written  as 

m  _  sj  rg  \  _  Moido 
T  -  No{9q)  -  YZTpM 


D=M0{9i) 


•Mo(fli) 

1-V0(di) 


and 
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Define  the  transformation  of  random  variables  Y  =  g(X),  and  let  f(y,9)  and 
F(y,  6)  denote  the  density  and  distribution  of  Y,  respectively,  conditioned  on  6.  The 
functions  Mz{9)  and  Vz(9)  satisfy  the  following  Fredholm  integral  equations 

Vz{9)  =  F(—z]9)+  fh  Vy(9)f(y  —  z-,9)dy 

Jo 

A fz{9)  =  1  +  f  Ny(9)f{y  -  z ;  9)dy 
Jo 

where  0  <  z  <  h.  Unfortunately,  no  analytical  solutions  can  be  found  for  these 
equations.  However,  they  can  be  approximated  by  discretizing  the  integral.  The 
solution  is  determined  by  solving  the  system  of  linear  equations 

Vv(S)  =  F(-zi-,e)  +  YiwkVn(0)Kz*-  V,9)  (2.9) 

k= 1 

W„(«)  =  1  +  '£wkM,h(0)f(zk-zi;l>)  (2.10) 

k~l 

for  j  =  1, . . . ,  K,  where  0  <  zx  <  z2  <  . . .  <  zK- i  <  zK  <  h,  and  where  tyx, . . . ,  wK  is 
a  set  of  weights  chosen  according  to  some  rule.  For  example,  when  {zk}  and  {wk}  are 
the  roots  and  corresponding  coefficients  of  the  Legendre  polynomial,  (2.9)-(2.10)  is 
called  the  Ny strom  approximation  to  the  Fredholm  equations  of  the  second  kind  [6]. 
One  could  also  use  a  simple  rectangular  approximation,  which  reduces  the  integrals 
to  Riemann  sums  [5]. 

Markov  Approximation 

The  ASN  of  Page’s  test  can  be  expressed  in  another  way.  Let  ri(n )  denote  the 
probability  that  stage  n  will  be  reached  when  9  =  0;;  that  is, 

n{n)  =  Pr  {Si, . . . ,  Sn_i  €  [0,  h]  |  0;} 

Now 

oo  oo 

Afo (0j)  =  J2  n(r*(n)  -  r,(n  +  1))  =  ^  r^n)  (2-11) 

71= 1  71=1 
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Notice  that  Ti{n)  -  r,(n  +  1)  is  the  probability  that  the  test  terminates  in  stage  n. 

The  n{j)  can  be  computed  using  a  finite  state  Markov  chain  approximation  to 
Page’s  test,  an  approach  introduced  in  [7].  The  interval  [0,  h]  is  divided  into  a  total 
of  p  small  bins  of  equal  size,  and  each  bin  corresponds  to  a  single  state:  specifically, 
state  a.j  corresponds  to  the  subinterval  (xj_i, Xj],  where  Xj  =  j  =  l,...,p.  The 
probability  transition  matrix,  Q,  is  formed,  where  element  Qi  j  denotes  the  probability 
of  the  test  statistic  Sn  going  from  state  a;  at  time  n  —  1  to  state  aj  at  time  n. 
Two  additional  states  are  also  included.  The  first  is  the  starting  state,  ao,  which 
corresponds  to  Sn  =  0.  The  second  is  the  terminal  state,  a*,  corresponding  to  the 
interval  ( h ,  oo];  an  alarm  sounds  whenever  the  terminal  state  is  reached.  It  is  shown 
in  [7]  that  the  structure  of  Q  is 


where  0  and  1  are  (p  +  l)-dimensional  column  vectors  of  all  zeros  and  ones,  respec¬ 
tively.  Separate  transition  matrices  must  be  computed  for  6  =  6q  and  6  =  6 1:  let 
these  be  Q;,  i  =  0, 1,  respectively,  with  corresponding  submatrices  R;. 

Let  7 rn  denote  the  state  probability  vector  at  stage  n: 

7Tn  =  [PrfS'n  e  C*o},  •  •  •  >  P  *{Sn  G  Op},  Pf{ Sn  6  a*}] 

The  successive  state  probabilities  can  be  computed  recursively  [11]  as 
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where  7To  =  [1,0,...,  0]  is  the  initial  state  probability  vector.  Equation  (2.12)  can  be 
simplified  to 


ro(n)  =  7ToR"  %  to  =  1,2,... 


(2.13) 


where  n r'0  =  [1,0,...,  0]  (dimension  p  +  1).  Note  that  r0(l)  =  1;  that  is,  every  test 
always  requires  at  least  one  stage.  Substituting  (2.13)  into  (2.11),  we  have 


v„  (#o  = 


n= 1 


=  <(I-R,)-i 


(2.14) 


and  so  T  «  <  (I  -  Ro)”1 1  and  D  «  irj  (I  -  R,)"*  1. 


-1 


Monte  Carlo  Simulation 


Finally,  the  ASN  of  Page’s  test  can  be  obtained  via  Monte  Carlo  simulation  in  a 
straightforward  manner.  As  mentioned  above,  T  and  D  are  just  the  ASN’s  of  Page’s 
test  with  initial  score  zero,  which  can  be  approximated  by  the  average  of  the  stopping 
times  of  K  independent  runs.  Let  JV*  denote  the  stopping  time  of  run  k.  An  unbiased 
estimate  of  the  ASN  is 


M  = 


1 

K 


K 


k- 1 


where  Af  ~  T  when  the  samples  are  generated  under  f(x]  Q0)  and  J\f  ~  D  when  the 
samples  are  generated  under  /(s;0i).  The  Monte  Carlo  method  will  be  particularly 
useful  in  Chapter  7,  where  we  investigate  an  adaptive  procedure  which  is  not  a  version 
of  Page’s  test,  and  for  which  the  other  methods  cannot  be  applied  directly. 
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2.5  Other  Applications  of  Quickest  Detection  Pro¬ 
cedures 

To  conclude  this  chapter,  a  brief  survey  of  other  areas  where  quickest  detection  pro¬ 
cedures  are  applicable  is  given. 

Closely  related  to  the  disorder  problem  discussed  above  is  the  problem  of  detecting 
transient  signals.  In  this  case,  two  shifts  in  the  mean  occur:  from  0o  to  and  then 
back  to  Bq.  It  is  shown  in  [5]  that  the  ML  optimal  procedure  for  the  transient  problem 
is  again  Page’s  test. 

For  the  more  general  problem  of  correlated  observations,  a  version  of  Page’s  test 
can  be  obtained  by  replacing  the  nonlinearity  g  by 

„  /Y ,  _  w /•(*.  i  *rl 

*"(  ”)  “  g  MX.  |  .tr1 

Thus,  g  is  no  longer  memoryless,  but  now  is  a  function  of  the  past  data.  An  auto 
regressive  moving  average  (ARMA)  model  is  commonly  used  to  model  correlated 
data.  Here,  the  observations  Yn  arise  from  the  model 

Yn  =  j2a*Yk_i  +  '£bjVk-j 

1=1  j—0 

where  Vk  is  a  sequence  of  white  Gaussian  noise.  The  ARMA  model  is  useful  in 
spectrum  modelling  applications  [10].  It  can  also  be  used  to  detect  changes  in 
spectral  characteristics.  Such  changes  correspond  to  a  shift  in  the  parameter  set 
{a1: . . . ,  ap,  b0, . . . ,  6g}.  Examples  include  shifts  in  seismic,  speech,  and  biomedical 
signals. 

Most  research  in  quickest  detection  has  focused  on  the  problem  where  the  disor¬ 
der  is  a  shift  from  one  stationary  process  to  another.  However,  in  some  problems  of 
practical  interest,  change  may  be  time- varying.  An  example  of  this  is  the  detection 
of  sinusoidal  signals  for  the  purpose  of  carrier  synchronization.  In  this  case,  the  data 
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is  not  accurately  modelled  as  a  step  change  in  the  mean.  In  [4],  Blostein  derives  a 
procedure  that  is  suitable  for  detecting  time-varying  changes  in  the  mean.  Here,  it  is 
assumed  that  the  mean  before  and  after  the  disorder  are  known,  and  that  the  ampli¬ 
tude  of  the  mean  is  at  least  approximately  known.  The  procedure  is  similar  to  the 
time-varying  version  of  Page’s  test,  with  the  benefit  that  it  can  be  implemented  re¬ 
cursively.  While  this  test  is  not  optimal  in  the  sense  of  Lorden  [13],  simulations  reveal 
that  the  procedure  works  well  for  detecting  sinusoidal  signals  of  unknown  amplitude 
in  Gaussian  noise.  7 

Even  more  difficult  is  the  problem  of  detecting  changes  in  systems  where  the  statis¬ 
tics  are  not  easily  characterized,  or  where  more  than  just  the  mean  of  the  distributions 
is  nonstationary.  For  example,  in  [1],  the  problem  of  detecting  changes  in  geophysical 
systems  is  examined.  These  types  of  signals  exhibit  a  high  degree  of  nonstationarity 
(e.g.  alternating  segments  of  high  and  low  variance)  even  when  no  disorder  is  present. 
Such  signals  can  also  arise  in  biomedical,  speech,  and  image  processing  applications. 

Work  has  also  been  done  in  detecting  changes  in  the  parameters  of  state-space 
systems.  A  typical  example  is  a  Kalman  filtering  application,  where  one  wishes  to 
track  some  phenomenon  that  is  subject  to  sudden  changes,  such  as  a  maneuvering 
target.  In  [21],  a  generalized  likelihood  ratio  (GLR)  approach  is  introduced  to  handle 
this  problem.  The  presence  of  a  disorder  can  be  determined  by  monitoring  the  filter 
residual  process:  the  residual  is  white  Gaussian  noise,  with  zero  mean  before  the 
disorder  and  nonzero  mean  afterwards.  When  a  disorder  is  detected,  an  estimate  of 
the  disorder  magnitude  is  determined  and  used  to  adjust  the  model  parameters;  in 
essence,  the  model  is  bootstrapped  for  the  new  statistics.  8  The  application  of  the 
GLR  procedure  to  geophysical  signals  is  discussed  in  [1].  A  survey  of  failure  detection 

7Lorden’s  proof  of  optimality  requires  that  the  samples  before  and  after  the  disorder  time  be 
independent  and  identically  distributed. 

8It  is  assumed  that  the  system  is  observable  so  that  any  change  in  the  state  variables  will  show 
up  in  the  residual  signal. 
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in  dynamic  systems  is  given  in  [20]. 


2.6  Appendix 

2.6.A  The  Log-likelihood  Ratio  is  Necessary  and  Sufficient 
to  Maximize  fj 

Proposition  1:  A  necessary  and  sufficient  condition  that  fj  is  maximized  is  that 

g{x)  =  C\og^r 
Jo\x) 

for  some  C  >  0. 

Proof: 

(*=) 

To  prove  sufficiency,  simply  let  g(x )  =  Clog  We  have 

fj  =  u0C  [  log  ^f\f1{x)dx 

J  —oo  }o{X) 

=  uoCItfufo)  (A.l) 


where  u0  satisfies 


£exp{ 


w0  Clog 


Mf) 

fo(x) . 


fo(x)da 


-i: 


7i(*) 


1  UJqC 


Mx) 


fo(x)da 


The  latter  implies  w0  =  C  x.  Therefore,  (A.l)  becomes 


V  =  I(fi,fo) 
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In  addition,  Lorden  has  shown  [13]  that  optimal  performance  over  all  possible  choices 
of  g{x)  is 

V  = 

and  that  this  occurs  when  g(x )  is  the  log-likelihood  ratio.  Therefore,  fj  is  maximized. 


(=0 

To  show  the  converse,  consider  the  following  constrained  optimization  problem: 

/oo 

g(x)fi(x)dx 

-oo 

/oo 

exp  {wog(x)}  f0(x)dx  =  1 

-OO 

where  loq  is  any  fixed  positive  real  number.  This  is  a  so-called  isoperimetric  problem 
from  variational  calculus  [8, 12, 19].  The  solution  is  obtained  by  first  incorporating  the 
side  constraint  via  the  Lagrange  multiplier  method,  and  then  applying  the  standard 
calculus  of  variations  optimization  procedure. 

The  goal  is  to  determine  the  g(x)  for  which  the  integral 


/OO 

[woSr(x)/i(a:)  +  A  exp  {w0^(x)}  fo(x)]  dx 

-OO 


(A.2) 


is  stationary.  Here,  A  is  the  Lagrange  multiplier  associated  with  the  side  constraint. 
Suppose  that  g(x)  is  the  function  which  maximizes  rj,  and  consider  the  nonlinearity 

g(x)  =  g(x)  +  e-8g(x) 

where  8g(x)  is  an  arbitrary  variation  in  the  neighborhood  of  g(x).  Substituting  this 
into  (A.2),  we  have 

K(e)  =  f°°  {u>0  [g(x)  +  e  ■  Sg(x)\f ,(x)  +  Xe^x^s^f0(x)}  dx 

J  — OO 


Now,  a  necessary  condition  to  get  a  stationary  point  is 
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The  derivative  can  be  taken  inside  the  integral,  and  thus  we  have 

\uj06g(x)fi(x)  +  Xuj0Sg(x)ew°^  fo^x)1^  dx  =  0 

Rearranging  terms,  we  have 

w0  r  8g{x)  f/x(x)  +  Ae“»^/0(*)]  dx  =  0 

J  —  oo  L 

In  order  for  the  equality  to  hold  for  arbitrary  variations  8g ,  the  expression  within 
brackets  must  be  zero  for  all  x.  Therefore,  the  necessary  condition  becomes 

fi{x)  +  \e“°°Wfo(x)  =  0 


dK(e) 


de 


6=0 


=/: 


which  can  be  rearranged  to  get 


.M*)  __  Jog(-A)+w0S{a:) 

fo(x) 


and  thus 


U)0g(x)  =  log  -  log(-A) 

/o(s) 


(A.3) 


We  can  now  use  the  equality  constraint  to  determine  A.  Observe  that 

/OO 

exp  {u;o<7(x)}  fo{x)dx 

-OO 


Lu7)^Mx)dx 


=  -\SZh{x)i- 

Therefore,  A  =  —  1,  and  (A.3)  becomes 


1 

A 


,  1  ,  h(x) 

g(X)  =  —  log  —r—.  r 
wo  Jo{x) 

which  is  just  the  log-likelihood  ratio  scaled  by  the  factor  ■£-.  Finally,  u>0  was  chosen 
arbitrarily;  therefore  the  optimal  nonlinearity  is  the  log-likelihood  ratio  scaled  by  any 
positive  constant. 
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That  g(x)  results  in  a  maximum  (rather  than  some  other  stationary  point)  can 
be  seen  by  noting  that 


d2K(e ) 


de2 


e=0 


d_ 

~de 


J°°  [uo6g(x)ft(x)  +  XcJoSg(x)eu^+eS3^fo(x)}  dx 


/OO 

Sg2(x)eUo9^f0(x)dx 

-OO 


e-0 


(A.4) 


Clearly  the  integral  is  strictly  positive  and  we  have  seen  that  A  =  —  1.  Therefore, 

<PK(e)  I 


de2 


<  0 


e=0 


and  so  the  stationary  point  is  in  fact  a  maximum. 

Also,  it  is  easy  to  verify  that  when  g(x)  =  g(x ),  rj  =  /(/i,  /o)*  Therefore,  rj  =  rj.  ■ 
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Chapter  3 


Robust  Quickest  Detection 


3.1  Introduction 

In  the  previous  chapter,  it  was  shown  that  optimal  procedures  for  quickest  detection 
exist  when  the  noise  distribution  is  known  and  the  samples  are  independent.  Unfor¬ 
tunately,  in  practice  the  true  noise  distribution  is  often  not  known  precisely.  This 
leads  to  two  questions.  First,  suppose  a  procedure  is  optimal  for  a  particular  noise 
distribution.  How  sensitive  is  the  performance  of  this  procedure  when  the  true  noise 
distribution  deviates  from  the  assumed  distribution?  Second,  if  it  is  only  known  that 
the  true  distribution  lies  within  some  noise  uncertainty  class,  what  then  is  the  optimal 
procedure?  Both  of  these  questions  are  addressed  in  this  chapter. 

The  disorder  problem  for  a  shift  in  the  mean  with  noise  uncertainty  is  very  similar 
to  that  for  known  noise  characteristics  stated  in  Chapter  2.  As  before,  assume  that 
a  sequence  of  independent  random  variables  ATi,  X2, . . .  is  observed  sequentially.  Let 
Ho  and  H\  define  the  hypotheses  “no  disorder  is  present”  and  “the  disorder  has 
occurred,”  respectively.  At  the  disorder  time  m,  a  one-time  shift  in  the  mean  from 
—6  to  6  occurs,  where  6  >  0.  Let  the  noise  distributions  before  and  after  the  disorder 
be  f0  and  /1,  respectively.  If  these  distributions  are  known,  we  have  seen  in  Chapter 
2  that  the  optimal  procedure  is  Page’s  test  implemented  using  the  log-likelihood. 
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In  this  chapter,  however,  it  is  assumed  that,  rather  than  having  perfect  knowledge 
of  the  noise  characteristics,  f0  and  fi  are  instead  known  only  to  lie  within  some  noise 
classes  To  and  T\.  For  this  case,  the  disorder  problem  involves  detecting  as  quickly 
as  possible  after  time  instant  i  —  m  that  a  shift  from  hypothesis  H0  to  hypothesis  Hi 
has  occurred,  where: 

Ho  •  Xi  ~  fo  6  To,  i  =  1, 2, . . . ,  m  —  1 

Hi  :  Xi  ~  fi  G  Ti,  i  =  m,m+  1,... 

As  before,  the  goal  is  to  minimize  the  expected  delay  in  detecting  the  disorder,  D, 
subject  to  a  lower  bound  on  the  mean  time  between  false  alarms  (MFA),  T .  However, 
it  is  necessary  to  be  more  specific  about  what  it  means  to  achieve  optimal  performance 
when  the  noise  pair  (/0,  fi )  is  known  only  to  lie  within  some  class  To  x  Ti- 

All  of  the  noise  uncertainty  classes  that  will  be  considered  here  consist  of  a  specific 
nominal  distribution  together  with  some  type  of  allowable  uncertainty;  in  particular, 
attention  will  be  paid  to  the  case  of  Gaussian  nominals,  due  to  their  widespread 
applicability.  One  design  option  is  to  implement  the  procedure  which  is  optimal  for 
the  nominal  distribution  and  assume  (hope!)  that  the  performance  will  be  similar 
in  the  case  where  the  true  noise  deviates  from  this  nominal.  If  the  size  of  the  noise 
classes  is  small,  this  design  philosophy  may  prove  satisfactory;  on  the  other  hand,  if 
the  noise  classes  are  large,  it  is  not  clear  what  the  outcome  might  be.  At  the  very 
least,  it  would  be  nice  to  know  what  performance  results  when  the  true  noise  is  not 
the  nominal.  A  more  desirable  design  scheme  would  be  to  optimize  the  performance 
over  the  entire  uncertainty  class.  Unfortunately,  a  single  procedure  which  is  optimal 
for  each  distribution  within  the  class  does  not  usually  exist.  1 

An  alternative  design  methodology  is  to  determine  the  test  which  optimizes  the 

1An  exception  to  this,  for  example,  is  the  noise  class  consisting  of  univariate  Gaussian  distribu¬ 
tions  whose  variance  may  lie  on  some  interval.  In  this  case,  Page’s  test  implemented  with  g(x)  =  x 
is  optimal  regardless  of  the  true  distribution;  this  follows  from  the  invariance  of  7]  with  respect  to 
scale  changes  in  17(21),  which  was  shown  in  Chapter  2. 
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performance  for  the  least  favorable  noise  distributions  in  JF0  and  T\\  this  is  the  well- 
known  minimax  design  philosophy.  2  A  disadvantage  of  this  scheme  is  that  the 
performance  will  be  less  than  optimal  when  the  noise  is  in  fact  generated  by  the 
nominal  distributions;  however,  the  worst  case  performance  will  be  maximized.  This 
is  the  key  reason  for  considering  robust  procedures:  not  only  can  one  get  reasonable 
performance  over  the  entire  noise  class,  but  the  performance  can  also  be  guaranteed 
to  be  at  least  some  minimal  value. 

In  Section  3.2,  the  solution  to  the  robust  quickest  detection  problem  is  given. 
The  approach  involves  applying  the  minimax  criterion  to  the  asymptotic  performance 
measure  introduced  in  Chapter  2.  It  is  shown  that  there  is  a  direct  connection  between 
robust  quickest  detection  and  robust  hypothesis  testing.  As  a  result,  many  of  the 
results  from  the  latter  can  be  applied  to  the  present  problem. 

The  exact  forms  of  the  robust  quickest  detector  for  two  noise  uncertainty  classes, 
the  e- contaminated  and  total  variation  classes,  are  given  in  Section  3.3.  Expressions 
for  a  lower  bound  on  the  asymptotic  performance  are  computed  for:  i)  the  robust 
procedure,  ii)  the  test  which  is  optimal  for  the  nominal  distributions,  and  in)  two 
nonparametric  alternatives.  The  computation  is  done  for  several  members  of  each 
noise  uncertainty  class. 

In  Section  3.4,  the  lower  bounds  are  evaluated  for  each  of  the  noise/detector  com¬ 
binations  over  a  range  of  parameters.  First,  it  is  shown  that  the  bounds  are  actually 
good  approximations  to  the  true  asymptotic  performance.  Next,  the  asymptotic  per¬ 
formance  is  computed  for  a  range  of  signal  to  noise  ratios.  A  useful  figure  of  merit, 
the  robustness  index,  is  also  computed:  this  is  a  measure  of  the  performance  gain 
(or  loss)  in  opting  for  the  robust  procedure  over  each  of  the  others.  The  effect  of  the 
level  of  uncertainty  assumed  in  the  noise  model  is  evaluated.  The  section  concludes 

technically,  this  should  be  called  “maximin”,  since  the  performance  here  is  measured  by  a  gain 
function  rather  than  a  loss  function.  However,  as  is  usually  done,  the  term  “minimax”  will  be  used 
throughout  with  the  true  idea  being  clear  from  the  context. 
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with  an  example  illustrating  the  utility  of  the  asymptotic  performance  measure  in 
practical  applications  where  high  MFAs  are  desired. 

In  Section  3.5,  the  robust  quickest  detector  for  the  weak  signal  case  is  determined. 
Again,  there  is  a  strong  connection  with  robust  hypothesis  testing,  and  therefore 
some  of  the  previous  work  can  be  exploited.  The  optimal  robust  detector  is  deter¬ 
mined  for  the  e-contaminated  noise  class  by  applying  the  minimax  criterion  directly 
to  the  efficacy,  which  is  proportional  to  the  weak  signal  asymptotic  performance.  The 
robustness  index  in  this  case  simply  reduces  to  the  asymptotic  relative  efficiency  be¬ 
tween  the  robust  and  alternative  procedures.  Finally,  some  performance  curves  are 
computed  to  illustrate  the  benefits  of  the  robust  procedure  in  weak  signal  applications. 

3.2  Robust  Asymptotic  Performance 

In  Chapter  2,  the  asymptotic  performance  measure  for  Page’s  test 

t 7  =  lim  (3.1) 

was  defined,  3  along  with  the  lower  bound 

p=u0E[g(X)\H1}<V 

where  a ;0  satisfies  the  moment  generating  function  equality 

E[exp{o;o<7(X)}  |  J5T0]  =  1 

In  most  situations  where  quickest  detection  procedures  are  applicable,  one  is  inter¬ 
ested  in  procedures  where  false  alarms  occur  infrequently,  in  other  words,  where  T  is 
large.  In  this  case,  we  see  from  (3.1)  that  T  and  D  can  be  related  by 

^logT^logT 
rj  rj 


3Unless  noted  otherwise,  “log”  denotes  the  natural  logarithm. 
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Since  the  ultimate  goal  is  to  minimize  D  for  fixed  T,  then,  for  large  T,  an  approxi¬ 
mately  equivalent  strategy  is  to  maximize  rj.  4  Similarly,  in  order  to  obtain  the  robust 
quickest  detector,  the  minimax  criterion  can  be  applied  directly  to  rj. 

We  wish  to  maximize  the  asymptotic  performance  of  Page’s  test  for  the  least  fa¬ 
vorable  noise  distributions  (/ol,  /il)  €  x  T\.  Recall  that  Page’s  test  involves  the 
computation  of  a  cumulative  sum  test  statistic,  where  each  sample  is  processed  by  a 
nonlinearity  g,  and  that  an  alarm  sounds  when  this  statistic  exceeds  some  threshold 
h.  The  asymptotic  performance  measure  describes  the  limiting  behavior  of  the  per¬ 
formance  as  T  — >  oo,  which  is  also  equivalent  to  h  —>  oo  (or  as  D  — »  oo).  Therefore, 
in  designing  a  version  of  Page’s  test  for  practical  (i.e.,  for  large  T)  applications,  one 
need  only  consider  the  choice  of  g. 

Let  Q  denote  the  set  of  all  memoryless  functions  g.  The  direct  minimax  problem 
is 

ma?  min  v(g]  fo,  fi)  =  v(9R]  foL,  Iil)  (3.3) 

as y  (/o./Oe-Aox./7! 

In  order  to  make  it  easier  to  solve  this  problem,  we  would  like  to  find  a  saddlepoint 
solution  of  (3.3);  that  is,  we  would  like  to  determine  some  ( gR ,  (foL,  /il))  that  satisfies 

max? l{g\  foL,  /il)  =  foL,  /il)  =  mm  rj(gR]  f0)  /i)  (3.4) 

9£y  (foJi)eiFoxJ-i 

This  allows  the  maximization  and  minimization  to  be  performed  separately,  rather 
than  jointly,  thus  simplifying  things  considerably.  The  following  proposition  estab¬ 
lishes  that  a  saddlepoint  does  exist  for  this  problem: 

Proposition  2:  There  exists  a  saddlepoint  solution,  (gR,  (/oLj  /il)),  of  (3.3). 


4 Although  77  is  a  lower  bound  on  rj,  it  will  be  seen  later  in  this  chapter  that  jy  w  77  in  most  cases 
of  interest. 
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Proof: 

In  [7],  Lorden  proved  that  no  procedure  has  better  asymptotic  performance  than 
Page’s  procedure  implemented  using  the  log-likelihood  ratio.  Therefore,  for  any  pair 
(/o,  /i),  we  have 

max 7/ (g;  f0,  /i)  =  rj(g*\  f0,  /i) 
g£y 

where  g*(x )  =  log^jfj.  In  particular,  when  (/o,/i)  =  (/on,  /in)  and  g*  =  gR  = 
log^jf],  we  have  the  first  equality  in  (3.4).  From  the  second  equality,  we  see  that  a 
saddlepoint  solution  (gR,  (foL,  fi l))  exists  when  (/on,  /in)  is  the  pair  which  minimizes 
rj(gR]  f0,  /i)  over  all  (/0,  /i)  that  is,  when  (/on,  /il)  is  the  least  favorable. 

Let  this  be  so,  and  the  proof  is  complete.  I 


In  [2],  Broder  showed  that  when  g  is  the  log-likelihood,  then  q  =  rj  =  I(fi,fo), 
where 

r(A,  A)  = 

is  the  Kullback-Leibler  (K-L)  divergence.  Therefore, 


min  v(gR]fo,fi) 

(/o,/i)e«AoxJ-i 


min 


/(/l./o) 


and  so  the  least  favorable  distribution  pair  is  that  which  minimizes  the  K-L  divergence. 

The  goal  now  is  to  determine  the  pair  that  minimizes  I(fi,fo)-  The  following 
Lemma  is  useful  for  this  purpose.  It  is  almost  identical  to  one  which  appears  in  [1]. 


Lemma  1:  Suppose  Vo  and  V\  are  classes  of  probability  density  functions  such  that 
all  members  of  Vo  U  V\  have  the  same  support;  if  q0  G  Vo  and  q\  G  V\  are  the  least 
favorable  in  terms  of  risk  for  Vo  versus  V\,  then 


I(qi,qo)  <  I(pi,Po)  Vpi  G  Pi,Vp0  G  Vo 
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The  proof  is  similar  to  Theorem  2.2  in  [1],  except  that  it  is  slightly  more  direct 
and  uses  the  current  notation.  It  will  be  useful  to  use  the  following  theorem  which 
appears  in  [9]  and  rephrases  part  of  Theorem  2.2  in  [1]: 

Theorem  1:  Suppose  (po,Pi)  and  (q0,qi)  are  two  pairs  of  probability  density  func¬ 
tions  all  of  which  have  the  same  support.  Then,  with  =  1  —  b0,  we  have 

b°  L  ^  ,  <lo(x)dx  +  /  gi(x)dx 

J{biqi>boqo }  •/{*>!  9i  <*>090  } 


-  b°  L  ^  Po(x)dx  +  b1f  Pl(x)dx  (3.5) 

^{fclpi>60Po}  Ja,  O,  <bnT,r,\  '  V  ' 


for  all  bo  £  [0, 1]  if  and  only  if 


roo 

/ 

J — oo 


L?o(s)J  90(X)C?3 

for  all  continuous  concave  functions  ip . 


*£*( 


{&lPl<&OPo} 


Po{x)\ 


p0(x)d. 


X 


Proof:  (Lemma  1) 

Let  tto  =  1  -  n  denote  the  prior  probability  that  p  €  V0.  To  prove  the  Lemma, 

we  simply  need  to  let  b0  =  7r0)  6X  =  ttj,  and  iP(u)  =  -ulogu.  We  then  directly  have 
that 

7r°  /  qo(x)dx  +  7Tx  /  qi(x)dx 

Ji *-igi>ir0go >  •/{^igi<Jrogo} 

-  x° /.  .  p0(x)dx  +  TT1  f  pAx)dx 

■/{iripi>iropo}  ^{iripi<ir0po} 

which  is  exactly  the  condition  for  ?0  6  P0  and  9l  6  V1  to  be  the  least  favorable 

densities  m  terms  of  risk  for  deciding  between  V0  and  Vl.  Using  Theorem  1,  this 
implies 


f°°  qi(x)  f gi(a;)\  .  r°°  pi(x) 

-  L  W) log  liw  j %{x)ix  --LiU  1o* 

which  is  the  same  as  I(qi,q0 )  <  I(pi,po). 


M *), 


pQ(x)da 
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The  above  Lemma  establishes  a  direct  relationship  between  the  hypothesis  testing 
problem  and  the  quickest  detection  problem.  In  particular,  if  a  pair  of  densities  is 
least  favorable  in  terms  of  Bayes  risk,  then  they  will  also  be  the  least  favorable  for  the 
quickest  detection  problem.  The  least  favorable  pair  in  terms  of  risk  has  been  derived 
for  several  uncertainty  classes  [5];  therefore,  previous  results  on  robust  hypothesis 
testing  can  be  applied  directly  to  the  quickest  detection  problem. 

3.3  Robust  Quickest  Detectors  for  Two  Noise  Un¬ 
certainty  Models 

In  this  section,  the  robust  quickest  detector  is  determined  for  two  noise  uncertainty 
classes:  the  e- contamination  class,  which  is  a  useful  noise  model  when  outliers  are 
present  in  the  data,  and  the  total  variation  class,  which  assumes  the  noise  distribution 
is  of  some  nominal  shape  plus  or  minus  some  deviation  whose  sum  total  does  not 
exceed  some  constant.  The  choice  of  noise  model  is  dependent  on  the  particular 
application. 

For  each  noise  model,  we  are  interested  in  comparing  the  performance  of  the 
procedures  arising  from  three  different  design  philosophies.  The  first  is  the  procedure 
which  is  optimal  for  the  nominal  distributions;  if  this  test  works  well  over  the  entire 
class,  then  a  robust  approach  may  not  be  necessary.  Thus,  how  the  test  performs 
when  the  noise  is  not  generated  by  the  nominal  distribution  is  of  primary  interest. 
The  second  is  the  minimax  robust  procedure.  The  minimax  criteria  maximizes  the 
performance  for  the  least  favorable  distributions;  however,  what  sacrifice  is  made 
if  the  noise  arises  from  the  nominal  distribution?  This  question  is  also  answered. 
Finally,  the  performances  for  the  nonparametric  procedures  with  g  chosen  to  be  the 
sign  detector  and  dead-zone  limiter  are  computed;  this  enables  us  to  determine  when 
a  nonparametric  test  gives  satisfactory  performance,  such  that  the  use  of  a  robust 
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procedure  is  not  necessary. 


3.3.1  ^-contaminated  Noise  Class 

The  noise  classes  for  the  £-contaminated  case  are  as  follows: 

Fo  =  {/(z)  =  (1  -  £o)/no(z)  +  £0h(x)  \/x  E  3ft,  h  6  H} 

=  {/(z)  =  (1  -  £i)fnl(x)  +  exh{x)  VsESR,/ieh} 

where  hi  is  the  class  of  all  legitimate  density  functions  on  3?,  and  e0  and  £i  are 
constants  that  lie  in  the  interval  (0, 1).  /„ o  and  /nl  are  nominal  densities  in  each 
class.  A  popular  member  of  the  e-contaminated  class  is  the  Gauss-Gauss  mixture 
density: 


Ux)  =  (1~e)v^expl^ 


1  f-X2 

+c7^exp\w 


Here,  most  of  the  noise  samples  are  Gaussian  with  variance  <7q.  However,  the  noise 
is  sometimes  (with  probability  e )  contaminated  by  samples  which  are  Gaussian  with 
variance  erf  >  erf.  This  density  is  useful  for  modelling  noise  that  is  occasionally 
contaminated  by  outliers:  a\  and  t  are  directly  proportional  to  the  magnitude  and 
frequency  of  the  outliers,  respectively. 

Huber  [5]  has  shown  that  the  least  favorable  densities  in  terms  of  risk  for  the 
e- contaminated  class  are: 


/io(z) 


(l-e„)/„o(*),  ft$< 


[  Cb'(l  -  to)/»,(x),  fcg  >  Co 

f,Jx)  =  j  (l-«l)W»).  fcfe)  >  C1 
l  ca(l  -  eO/noW.  £ff}<C, 

where  c0  and  c-y  satisfy  O<C1<I<C0<00  and  are  selected  so  that  Jlq  and  fn  are 
legitimate  density  functions;  in  other  words,  c0  and  Ci  satisfy 


Pr  { «*)  <  00 1  /"0}  +  c”’ Pr  {u$)  ~ Co  1  u 


1 

1  “  So 
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Pr{|s5)>Cll/",}  +  ClPr{tS£cil/"0}  "  i-«.  (3'7) 

It  is  shown  in  [5]  that  c0  and  c1  are  unique.  Furthermore,  the  left  hand  sides  of  (3.6) 
and  (3.7)  are  both  monotonic  functions  of  c0  and  Cj,  respectively,  so  they  can  be  solved 
using  a  bisection  algorithm.  It  can  be  seen  directly  that  the  robust  nonlinearity  for 
this  case  is 


gR(x)  = log 


hi(x) 

/lo(s) 


which  is  seen  to  be  a  “censored” 


log  Cl  +  log  (yz^)  . 

<  loS  fc(f)  +  los  (rf) 

[  log  Co  +  log  (iZ^)  , 

hlM  <  Cl 

fnO  (®)  -  Cl 

^  fnl(x)  ^  r 

>  Cl  <  7Z 0(x)  <  Co 

/no(x)  -  C° 

version  of  log 

Optimal  Performance  for  the  Least  Favorable  Distributions 

As  shown  in  the  previous  section: 


v{9R\  foL,  /in)  =  IUliJlo) 


Thus,  the  asymptotic  performance  is  obtained  by  simply  evaluating  the  K-L  diver- 
gence  for  the  least  favorable  distributions.  The  result  is 

v(9r;  foL,  /in)  =  1  [loS  C1  +  los  (1  _£*)  Cl(1  “  £i)Mx)dx 

\/no(*)-Cl  J 


4- 


h 


\Cl</r,0(*)  <  ° 


} 


Js  hiM 

\  fo(x) 


MS)* 


1  —  £l' 


log  (rrl) 


(1  -£l)fnl(x)dx 


+  /{^>»}  ilogc° + log  liv/)] (1  _  ei)/nl(l),il 


ci  Pr 


1  —  £o 

fnl(x) 


{  /no(*) 


+  (1  —  £i)  ci  log  ci  Pr  {  {nl [  <  ci  |  /n0|  +  log  cq  Pr  {  -c  7—^  >  <*>  \  fm 


Jno(x) 


K  fn o(x) 


+ 


(1  -  £1)  lr  ,  log  (44^)  fm(x)dx 

J{Ct<7%M<C0}  V/nO  (*)/ 
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log(f^)+(1-£l)/{«<3 

.  j/n: l(*) 


1  loS  fm{x)dx 


Jno(x), 


+  (1  -  £i)  ci  log  ci  Pr  ■ 


.  fno(x) 


■} 


<  Cj  \  fn0>  +  log  Co  Pr 


/nl(g) 

.  fn o{x) 


>  Co  I  fn  1 


) 


(3.8) 


where  the  last  inequality  results  by  substituting  in  (3.7).  The  e-contamination  model 
shown  above  makes  the  provision  that  the  contamination  of  the  nominal  distributions 
may  differ  under  the  null  and  alternative  hypotheses,  and  so  e0  ^  E\.  While  there 
are  some  applications  where  this  is  the  case,  the  present  work  focuses  on  the  case  the 
disorder  is  solely  due  to  a  shift  in  the  mean.  Hence,  we  let  £o  =  £i  =  £■ 


e-contaminated  Model  with  Nominal  Gaussian  Noise:  Optimal  Perfor¬ 
mance  for  the  Least  Favorable  Distributions 


Here,  we  consider  the  specific  case  where  the  noise  pair  lies  in  the  e-contaminated 
class  with  nominal  Gaussian  distributions,  and  e0  =  £i  =  £•  Let 


<p(x;cr) 


0-x2/2<r2 


and  define  <f>o(x)  =  ip(x  +  0;cr)  and  <j)\{x)  =  ip(x  —  0;  o).  The  nominal  densities  are 
now  fn0(x)  —  4>o{x)  and  /„ i(s)  =  <f>i(x). 

In  Figure  3.1,  examples  of  the  least  favorable  pair  are  shown  for  e  =  0.1  and 
E  =  0.25  along  with  the  nominal  pair,  where  9  =  <Tq  —  1.  Notice  that  the  least 
favorable  contamination  of  fno  involves  increasing  the  density  in  the  neighborhood  of 
fni,  and  vice  versa,  making  it  more  difficult  to  distinguish  between  the  two  hypotheses. 

The  optimal  performance  for  the  least  favorable  distribution  pair  (/„o,  /n i)  is  given 
in  (3.8).  Compute  the  four  terms  individually.  The  first  one  is  clearly  zero.  Comput¬ 
ing  the  second  term  requires  us  to  evaluate  the  integral 


h 


cl<- 


i  llog 

)<coj 


<t>i  (x)d 


u 

2  9 


-r^-logCi-0 


(3.9) 
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where  the  change  of  variables  y  =  x  —  6  was  made.  The  right  side  can  be  broken  up 


into  two  parts: 

26  rbo-e  y 


20  r°o- 
fXn  Jb\  — 


q  Jb\~&  \/27ra0 


V4)*y  +  Kf 

rl  j  <*Q  J*> 


2 e2  rb°~e  l 


h -8  s/TkGq 


where  b0=-^  logc0  and  bi= logci.  The  left  term  can  be  integrated  directly  and  the 
right  term  can  be  written  in  terms  of  the  standard  normal  cumulative  distribution 
function  Therefore,  (3.9)  equals 

./IlLpMfe-*)1!  _ r-(6o-f)2ll  ,  2«2  L P*-«\ 


—  exp 


b0 -6 


h-e 


Evaluating  the  third  term  requires  the  following  computation: 


Pr{lt(f)  - Cl  1  <^°}  =  ^1^  <  ‘“s^1 1  ^4  =  * 

Similarly,  to  evaluate  the  fourth  term,  we  need: 

/  MX)  ^  _  I  ,  1  _  -D_  /_  V.  ,  .  ,  ,  ] 


h  +  e 


(3.10) 


<f>  o(s) 


>  Co  I  <^1  }  =  Pr  {  x  >  -J  log  Co  I  <t>!  }  =  1  -  $ 


b0 -6 


-bo  +  6 


(3.11) 


Chapter  3:  Robust  Quickest  Detection 


41 


By  using  (3.9)-(3.11),  we  finally  have  that  (3.8)  is 

f 2  e 


vigR',  foL,  Jil) 


(1  -£) 

+  (1-e) 


7r  <t0 


/2^ 

K  L 


exp 
$  ( 


l  2°o  I 


b0  -e' 


$ 


—  exp 

bx  -6' 


-(bo  -  ey 
2ol 


+  (1  -  e)  \  c!  log  ci  $ 


cr  o  ) 
fbi  +  6' 

V  ^0  , 


where  Co  and  C\  are  chosen  to  satisfy 

bo  9 


$ 


$ 


o- 0 
-h  +  0 
<?0 


+  ( 


+  Ci$ 


— bo  +  9 

Vo  , 

'h+jr 


\  cr0 
+  log  Co  $ 

1 


-6o  +  9 


)) 


l-£ 

1 


(3.12) 

(3.13) 


Three  interesting  points  can  be  made  about  the  above  equations.  First,  notice 
that  rj(gR ;  foL,  fn)  can  be  alternatively  written  in  terms  of  the  nominal  signal  to 
noise  ratio  d=—: 


&0 


V  =  (l~e) 


log  ci  —  d 


exp 


B(slogcH }]} 


+ 

+ 


(1  -£){2d3  [t  (i  log  Co  -  4)  -  *  (^  ^  -  ■*)] } 

(1  -  e)  jci  log  ci  $  log  cx  d'j  T  log  c0  $  loS  Co  +  | 


This  demonstrates  that  the  performance  is  proportional  to  the  relative  values  of  the 
nominal  signal  9  and  nominal  noise  power  <?%,  rather  than  the  absolute  values.  This 
is  not  surprising  because:  i)  we  are  using  Gaussian  nominals,  and  ii)  for  hypothesis 
testing  in  Gaussian  noise,  the  probability  of  error  is  a  function  of  the  signal  to  noise 
ratio  [11]. 

Another  observation  is  that  c0  =  c^1  (and,  hence,  b0  =  —  &i).  To  see  this,  first 
assume  that  c0  satisfies  (3.12),  and  then  substitute  Cq  <—  cjf1;  the  result  is  equation 
(3.13).  Similar  logic  shows  that  (3.12)  can  be  obtained  from  (3.13).  Since  both 
equations  must  hold,  cq  =  c^1  (note  that  this  is  only  the  case  when  £q  =  £i). 
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The  last  point  involves  the  range  of  values  that  e  can  take  on.  An  upper  bound 
can  be  obtained  by  considering  the  case  when  Co  —  Ci,  corresponding  to  the  minimum 
value  of  c0  for  which  the  model  is  still  valid  (recall  that  c\  <  c0).  Since  c0  >  0,  this 
occurs  at  Co  =  Ci  =  1.  5  Substituting  these  values  into  (3.12)  and  (3.13),  we  have 
£  =  £*,  where 


Also  notice  that  (3.13)  can  be  written  as 


and  that  the  left  side  of  this  equation  is  monotonically  increasing  in  b\  =  —b0  < 
0.  Therefore,  the  largest  allowable  value  of  iq,  namely  zero,  results  in  the  largest 
allowable  value  of  e,  namely  £*.  We  have  thus  defined  the  breakdown  point ,  £*,  for  the 
£-contaminated  noise  class:  the  condition  e  6  [0,  £*)  must  be  met  for  the  problem  to  be 
valid.  In  other  words,  e  is  small  enough  to  insure  that  To  and  T\  are  distinguishable. 
The  breakdown  point  is  plotted  versus  ■£-  in  Figure  3.2. 

Performance  Involving  the  ^-contaminated  Model  with  Nominal  Gaussian 
Noise  when  the  Assumed  and  True  Noise  Densities  Differ 

Perhaps  the  most  interesting  performance  computations  involve  cases  where  the  noise 
assumptions  used  to  design  the  detector  do  not  match  the  true  distributions.  Exam¬ 
ining  such  cases  enables  us  to  evaluate  the  performance  degradation  that  results  when 
it  is  not  possible  to  implement  the  optimal  procedure,  for  example,  when  the  noise 
is  not  completely  characterized.  It  was  shown  in  [2]  that  when  the  assumed  and  true 
distributions,  say  f0  and  /i,  agree,  then  rj  =  rj  =  J(/i,/o).  However,  in  general,  not 
only  is  the  computation  of  r\  is  more  complicated,  but  also  the  lower  bound,  77,  on  77 
is  not  tight.  We  will  see,  though,  that  rj  is  still  a  good  approximation  for  77. 

5Note,  however,  that  this  results  in  the  degenerate  case  of  gn( x)  =  0. 
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Figure  3.2:  Breakdown  point  for  the  e-contaminated  noise  model. 

The  procedure  for  computing  77,  detailed  in  Chapter  2,  is  reiterated  here.  Let  g(x) 
be  an  arbitrary  nonlinearity,  and  let  (/o,  fi )  denote  the  true  noise  density  pair.  Then 
77  is  given  by 

rj  =  u>0E  {g(x)  |  /1}  (3.14) 

where  cu0  is  the  unique  nonzero  root  of  the  moment  generating  function  equality 

E{e»op(*)  |  fQJ  =  i  (3.I5) 

When  g(x)  is  the  log-likelihood  ratio  between  f0  and  fi,  u>0  =  1  and  (3.14)  reduces 
to  the  K-L  divergence.  If  this  is  not  the  case,  (3.15)  must  be  solved  for  u>0,  and  then 
(3.14)  can  be  computed. 

Table  1  summarizes  each  of  the  scenarios  for  which  fj  is  computed.  For  the  linear 
and  robust  detectors,  the  particular  expressions  corresponding  to  equations  (3.14) 
and  (3.15)  can  be  found  in  Appendix  A.  The  computation  of  fj  for  nonparametric 
procedures  is  discussed  next. 
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Table  1:  Computations  involving  the  e-contaminated  model 


noise  type 

procedure  used 

Gaussian 

linear 

Gauss-Gauss 

sign 

least-favorable 

dead-zone 

robust 

Nonparametric  Alternatives 


The  robust  quickest  detector  is  a  parametric  procedure.  That  is,  specific  assumptions 
are  made  about  the  noise  classes:  they  are  “centered”  about  some  known  nominal  dis¬ 
tribution,  and  the  contamination  factor  e  is  known.  On  the  other  hand,  nonparametric 
detectors  involve  procedures  that  make  no  assumptions  on  the  noise  characteristics, 
but  assume  only  that  a  shift  in  the  mean  from  some  negative  value  to  some  positive 
value  occurs.  It  is  useful  to  compare  the  performance  of  the  quickest  detectors  arising 
from  these  two  paradigms.  Nonparametric  quickest  detection  has  been  considered 
extensively  in  [2];  the  main  results  of  that  work  are  repeated  here. 

Suppose  the  nonlinearity  is  given  by 


9{x)  = 


—1,  x  <  —d 
0,  —d  <  x  <  d 


1,  x  >  d 

This  is  called  a  random  walk  nonlinearity.  In  [2]  it  is  shown  that  when  g  is  used  in 
Page’s  test,  the  lower  bound  rj  is  tight,  and  is  given  by 

9o 


v  =  \pi-  9i]  l°g  • 


Po 


Pi  =  Pr {g(x)  =  1  I  /»},  i  =  0, 1 
«  =  Pr{s(z)  =  -1  |  fi},  i  =  0, 1 


where 
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When  d  —  0,  g{x)  is  the  sign  detector;  otherwise,  it  is  called  a  dead-zone  limiter. 
For  the  latter  case,  a  value  of  d  =  0.612  •  cr0  is  used,  since  this  choice  maximizes 
the  efficacy  given  that  the  noise  is  Gaussian  [6];  this  is  reasonable  since  the  noise  is 
nominally  Gaussian  with  variance  o\.  6  Once  /0  and  fi  are  known,  the  values  pt  and 
qi  are  easily  computed,  and  hence,  so  is  the  performance. 

We  would  like  to  determine  what  level  of  robustness  can  be  gained  by  using  non- 
parametric  techniques.  It  was  shown  in  [2]  that  the  sign  detector  and  dead-zone 
limiter  often  outperform  the  linear  detector  when  the  tails  of  the  noise  distribution 
are  heavier  than  Gaussian.  Intuitively,  one  expects  that  the  robust  procedures  would 
outperform  their  nonparametric  counterparts,  since  more  assumptions  are  incorpo¬ 
rated  into  the  model:  as  we  shall  see,  this  is  true  in  most  cases. 

3.3.2  Total  Variation  Noise  Class 

As  in  Section  3.3.1,  denote  the  nominal  density  function  pair  by  ( fno,fni )  and  the 
least  favorable  pair  by  (fno,  /li)-  The  noise  classes  for  the  total  variation  model  are: 

/OO 

I/O)  -  fn0(x)\dx  <  5} 

-OO 

/OO 

I/O)  -  fm(x)\dx  <  6} 

-oo 

That  is,  the  sum  total  of  the  variation  of  /,•  £  from  the  nominal  /TO-  does  not  exceed 

8.  This  class  is  useful  in  cases  where  the  overall  shape  of  the  noise  density  is  exactly 

or  approximately  known,  but  where  there  is  still  some  uncertainty  proportional  to  8. 

This  uncertainty  might  arise  due  to  modelling  error,  or  possibly  due  to  assuming  that 

the  noise  is  stationary  when  in  fact  it  is  nonstationary. 

6In  [2],  the  dead-zone  limiter  is  implemented  with  d  =  0.612  •  a,  where  a2  is  the  variance  of  the 
fi  rather  than  /n{.  It  makes  no  sense  to  do  this  here,  however,  since  we  do  not  know  what  the  noise 
density  is  in  reality,  only  that  the  nominal  density  is  Af(0,  <Tq). 
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The  least  favorable  densities  in  this  case  are  [5]: 

J+^(fno(X)  +  fnl(x)),  /^(x)  -  C0 

/lo(x)  =  '  fno(x),  Co  <  <  Cl 

rfcC Mx)  +  /„(*)),  fejfj  >  C! 

+  /-(*))•  few  -  c« 

&W  =  /-(*).  <*  <  fe$ 

ifetM*)  +  /«(*)).  fe$  ^  -I 

and  the  robust  nonlinearity  is 


C°  <  /» o(«)  <  Cl 


gR.{x)  =  log 


/&i(g) 

/lo(x) 


log  Co, 

/no(x)  S  00 

i0g  -M») 
10g  /*o(»)’ 

v  /nl  (®)  >-  r 

00  <  /«o(»)  <  1 

log  Cl, 

^  >Ci 

where  0<c0<l<ci<oo,  which  again  is  seen  to  be  a  “censored”  version  of 
log  4l(e)  7 

10g  /no(x)‘ 

The  least  favorable  distributions  satisfy 

/  |/lo(®)  -  /no(*)M®  =  £ 

v/  —  CO 

/  |/li(*)  -  /nl(*)M®  =  5 

v/  —  oo 

It  is  shown  in  [5]  that  a  sufficient  condition  for  this  is  that 


f(}nl<  |(/il(:E)  —  fnl(x))dx  —  |(/i°(Z)  fno{x))dx  — 


(3.16) 


The  values  of  Co  and  C\  are  determined  by  solving  (3.16).  As  with  the  e-contamination 
noise  model,  Co  and  c\  are  unique.  In  particular,  if  we  let  ko  =  then  the  first 

term  in  (3.16)  can  be  written  as 

[  [(fno(x)  +  fni(x))ko-fni(x)]dx 

7Note  that  here  c0  <  ci,  whereas  in  Section  3.1,  Ci  <  c0. 
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which  is  an  increasing  function  of  k0 ;  thus,  a  bisection  algorithm  can  be  used  to  solve 
for  ko,  and  hence,  for  Cq  as  well.  Similarly,  ci  is  determined  by  letting  k\  =  and 
solving 

/  <?(  f  .  4  u.  \  [7n0(X)  fnl(x))^l  —  /no(aO]  dx 

which  is  an  increasing  function  of  k\.  The  above  conditions  can  be  written  in  the 
more  useful  form: 

<  c°  | /n0}  +  (A;°  -  l)Pr{y^  <  c°  | /nl}  =  f  (3.17) 

l/noj+^Prj^^d  |/nl|  =  8-  (3.18) 

Optimal  Performance  for  the  Least  Favorable  Distributions 


As  shown  in  Section  2,  the  optimal  robust  asymptotic  performance  is  given  by  the 
K-L  divergence,  which  for  the  least  favorable  total  variation  density  pair  is 


y(gR]foL,fiL )  = 


+ 

+ 


+ 

+ 


^hu<co  j^log  Co)rT^(/no(a° + fni(x^dx 

J  1_  ( MXY 


{C0<*<Cl}  \fno(x) 

>cl|(log  ci)rf^(/no(x) + ^x))d- 


fni{x)da 


Co 


1  +  Co 
Cl 

1  +  Cl 

Ju 


log  Co 
log  Cl 


7. 


Pr|^<c0|/n0|  +  Pr 


Pr 


fn  1 

fn 0 

,  /  fnl(x) 

{C0<fe<C1}  °g  \/«o(®) 


fn 1 

fnO 

fr 


<  C0 


>  |  fn0  }•  +  Pr  {  ^  >  d 

JnO 


fnl(x)dx 
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Total  Variation  Model  with  Nominal  Gaussian  Noise:  Optimal  Perfor¬ 
mance  for  the  Least  Favorable  Distributions 

Here,  we  consider  the  specific  case  where  the  noise  pair  lies  in  the  total  variation  class 
with  nominal  Gaussian  distributions  fno(x )  =  (f> 0(x)  and  fn i(x)  =  fi(x).  In  Figure 
3.3,  the  least  favorable  densities  in  the  total  variation  class  are  shown  for  8  =  0.2  and 
8  —  0.5  along  with  the  nominal  Gaussian  densities,  where  6  =  crfi  =  1.  As  with  the 
least  favorable  e-cont  animated  densities,  fro  and  fn  for  the  total  variation  model  are 
seen  to  look  like  one  nominal  corrupted  by  the  other,  thus  increasing  the  difficulty  in 
distinguishing  To  from  T\. 


Figure  3.3:  Nominal  Gaussian  and  least  favorable  total  variation  densities.  6  =  cr£  =  1- 


Again,  for  convenience,  define  log  c0  and  &i  =  ^  log  ci-  Then 


V(9R>  foL,  /it)  =  ~  log  c0  $  ^  °,to  -)  +  ^ 


'bp-f 
.  ^0  , 


+ 


1  +  Cl 


log  C! 


#  ( +  $  ( zh+f 


c0 


c0 
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~(bo  ~  Of 
2  (To 


—  exp 


-  el 

2<7o 


bi-6 


b0-e 


The  last  two  terms  arise  when  the  integral  over  {c0  <£<  ci}  is  evaluated;  however, 
note  that  this  is  the  same  computation  as  in  (3.9),  except  that  the  roles  of  Co  and  c\ 
are  now  reversed.  The  values  of  Co  and  cx  are  determined  by  solving  (3.17)  and  (3.18) 


for  this  particular  choice  of  densities.  They  are: 


bo  +  0 


+  (ko  - 1)$ 


(h  - 1)$ 


-bi-e 


+ 


bo-e 


-bi  +  0 


(Recall  that  bi  and  ki  are  functions  of  c,-,  for  i  =  0, 1) 

As  with  the  e-contaminated  model,  rj(gR]  /ol,  /ii)  f°r  the  total  variation  model 
can  be  expressed  in  terms  of  the  nominal  signal  to  noise  ratio  <Z=^.  We  have 

v(9R\f°L,fu.)  =  1os  c«  [*  (k  >°e  c°  +  d)+*{k log  Co  -  d)\ 

+  r+^ log  Ci  [*  (~k log  Ci " d) + *  irk 108  Ci + d)] 


+  ^[expj-i^logco-i)  |-exp|-i(ilog 

+  M2K^i°gc--rf)-$(siogc»-‘i)] 


Cl-  d) 


Also,  since  the  nominal  densities  are  symmetric,  we  again  have  that  Co  =  c^1. 

The  breakdown  point  can  be  computed  in  a  manner  similar  to  before.  We  notice 
that  again  this  occurs  at  c0  =  cx  =  1,  and  so  the  critical  points  for  8  are 

8*  =  $  (— )  -  $  (— )  =  2$  (— \  -  1 
\crQJ  \(ToJ  \(ToJ 

Hence,  for  a  given  the  model  is  valid  for  8  €  [0, 8*).  The  breakdown  point  is 
plotted  versus  in  Figure  3.4. 
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Figure  3.4:  Breakdown  point  for  the  total  variation  noise  model. 

Performance  Involving  the  Total  Variation  Model  with  Nominal  Gaussian 
Noise  When  Assumed  and  True  Noise  Densities  Differ 

Here,  rj  is  computed  for  several  cases  where  the  noise  assumptions  used  to  design 
the  detector  do  not  match  the  true  distributions.  A  summary  of  all  computations 
involving  the  total  variation  model  is  shown  in  Table  2.  The  discussion  in  Section 
3.3.1  regarding  the  computation  of  fj  also  pertains  here.  The  details  of  the  derivations 
can  be  found  in  Appendix  B. 

Table  2:  Computations  involving  the  total  variation  model 
noise  type  procedures  used 

Gaussian  linear 

least-favorable  sign 

dead-zone 


robust 
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Nonparametric  Alternatives 

The  discussion  of  Section  3.3.1  also  applies  here.  The  versions  of  Page’s  test  involving 
the  sign  detector  and  dead-zone  limiter  are  again  evaluated  against  the  robust  quickest 
detector. 


3.4  Performance  Comparison 


In  this  section,  the  performances  for  all  of  the  noise/quickest  detector  combinations 
are  computed  for  some  particular  cases  involving  the  e- contamination  and  total  vari¬ 
ation  models.  Throughout,  we  assume  for  both  noise  classes  that  the  nominal  distri¬ 
bution  is  Gaussian  with  variance  <Tq  =  1,  and  that  the  mean  is  —6  before  the  disorder 
and  6  afterwards. 

For  each  of  the  two  noise  classes,  the  performance  will  be  computed  for  the  cases 
where  the  true  noise  pair  (/o,  /i)  is:  i)  the  nominal,  and  ii)  the  least  favorable  pair.  In 
addition,  the  performance  is  also  computed  for  another  member  of  the  ^-contaminated 
class,  the  Gauss-Gauss  mixture,  where  the  contaminating  distribution  is  Gaussian 
with  variance  v\  =  100.  The  parameter  7=^  is  often  used  to  represent  the  magnitude 
of  the  outliers  relative  to  that  of  the  nominal  noise  samples;  here  7  =  100. 

It  will  be  useful  to  define  the  effective  signal  to  noise  ratio  (SNR),  as  follows: 


.p  |E{V  I  /■•}! 

yJVar{X  |  /i} 1 


Si  e  Fi,  i  =  0, 1 


This  is  seen  to  be  a  weighting  of  the  signal  strength  (the  mean)  by  the  uncertainty  (the 
variance);  both  of  these  quantities  vary  for  different  members  of  Ti-  Because  we  are 
considering  the  case  where  the  nominal  densities  are  symmetric,  that  is,  fn0(—x)  = 
fn  1(2),  we  see  that  E{X  |  /i}  =  — E{A  |  /0}.  Thus,  the  effective  SNR  can  now  be 
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related  to  the  deflection  SNR  [3]: 

,,m  a  |E{Jf  |  I  Ml2 

SNR*/= - Var{X  |  /„} - 

which  is  a  measure  of  the  relative  detectability  between  the  pre-  and  post-disorder 
hypotheses,  f0  and  /i;  8  we  have: 


*-3(s“W)‘ 


Note  also  that  the  effective  SNR  can  be  alternatively  expressed  in  decibels: 


v^=101og10tf2 


It  is  not  difficult  to  verify  that  for  Gaussian  noise 


<70 


and  for  Gauss- Gauss  noise 


6 

—  e)cr$  +  eo\ 

However,  the  computation  of  'I'  for  the  least  favorable  distributions  is  more  involved, 
and  therefore  can  be  found  in  Appendix  C. 

In  the  previous  section,  rj  was  computed  for  several  noise  and  detector  combi¬ 
nations.  We  validate  the  accuracy  of  those  expressions  by  comparing  the  computed 
values  of  rj  with  estimates  of  rj  obtained  by  measuring  the  inverse  of  the  asymptotic 
slope  of  the  plot  of  D  versus  logT  (recall  that  this  is  exactly  the  definition  of  rj).  In 
8  A  more  general  definition  of  the  deflection  SNR  is 

a  lETOI/d-ETOI/o}!2 

SNRi‘f- - vSpwlM - 

which  is  useful  in  evaluating  the  power  of  a  detection  procedure  where  the  samples  are  processed  by 
the  nonlinearity  T(*).  Thus,  T(x)  =  x  in  the  present  case.  A  comparison  of  other  definitions  of  the 
SNR  is  the  subject  of  [3]. 
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Appendix  D,  performance  curves  are  shown  for  each  of  the  noise  distribution  pairs  for 
^ dB~  0  and  —  10  dB,  followed  by  a  series  of  tables  summarizing  the  measured  and 
computed  values  of  the  asymptotic  performance.  In  all  cases,  77  and  rj  agree  within 
2%.  Therefore,  based  on  the  computations  performed,  the  approximation  77  «  rj 
appears  to  be  well-founded. 

In  order  to  evaluate  the  performance  gain  or  loss  in  opting  to  use  a  robust  proce¬ 
dure  rather  than  one  that  is  nonparametric  or  based  on  the  nominals,  we  define  the 
robustness  index  as 

a  Va 
Xa,b=— 

V  B 

where  tJa  and  rjB  are  the  values  of  77  for  procedures  labelled  A  and  B ,  respectively. 
The  robustness  index  also  allows  us  to  relate  the  relative  expected  delays  for  each 
procedure.  We  have  seen  that  rj  closely  approximates  77.  As  a  result,  for  large  T, 
equation  (3.2)  in  Section  2  becomes 


log  T 
V 


(3.19) 


Let  Da  and  Db  denote  the  expected  delays  for  procedures  A  and  B,  for  some  T  which 
is  the  same  for  both  procedures.  Using  (3.19),  we  have 


Db  „  (log  T\  (log T\ 
Da  ~\  VB  )  ‘  [va  ) 


VA 

=  —  =  Xa,b 
V  B 


Thus,  a  decrease  in  the  expected  delay  corresponds  to  an  increase  in  the  performance. 


3.4.1  7]  Versus  ^  for  Different  Detector/Noise  Combina¬ 

tions 

Figures  3.5  through  3.18  illustrate  the  asymptotic  performance  for  each  of  the  proce¬ 
dures  as  a  function  of  'F,  as  well  as  the  improvement  index  of  the  robust  procedure 
over  each  of  the  others.  Figures  3.5  and  3.6  show  the  performance  when  the  noise 
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Figure  3.5:  rj  for  Gaussian  noise. 


Figure  3.6:  Robustness  index  for  the  robust  procedure  with  e  =  0.1  in  Gaussian 


noise. 
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is  the  nominal  Gaussian.  9  The  linear  detector  is  optimal  for  this  case,  and  this  is 
reflected  in  the  plots.  Also  notice  that,  when  the  robust  detector  is  used,  the  assumed 
level  of  contamination  e  is  inversely  proportional  to  the  performance.  The  plot  of  % 
versus  ^  reveals  that  the  expected  delay  for  the  linear  detector  is  only  about  75% 
of  that  for  the  robust  procedure.  This  is  the  price  one  pays  for  robustness:  when 
the  noise  is  close  to  nominal,  the  robust  procedure  will  react  less  quickly  than  the 
procedure  which  is  optimal  for  (/nOj/ni)-  On  the  other  hand,  the  robust  procedure 
outperforms  the  sign  detector  for  any  choice  of  e  and  the  dead-zone  limiter  for  smaller 
e;  however,  notice  that  if  e  =  0.2  is  assumed,  then  the  dead-zone  limiter  is  the  better 
choice.  We  see  that  as  e  gets  larger,  the  robust  procedure  incorporates  less  infor¬ 
mation  about  the  nominals  (i.e.,  the  nominal  log-likelihood  ratio  is  clipped  at  lower 
levels),  resulting  in  a  decrease  in  77. 

On  the  flip  side  of  the  above  discussion,  the  conservativism  of  the  robust  procedure 
can  be  more  than  offset  when  the  noise  is  not  nominal,  as  shown  in  Figures  3.7 
through  3.10.  Here,  Gauss-Gauss  noise  is  considered  for  e  =  0.01  and  e  =  0.1. 
The  most  striking  observation  is  that  the  linear  detector  performs  much  more  poorly 
than  any  of  the  other  procedures  considered:  the  robust  procedure  outperforms  the 
linear  test  by  more  than  a  factor  of  eight  in  some  cases,  and  would  therefore  be 
preferred  in  a  noise  environment  in  which  outliers  are  present.  The  performances  of 
the  nonparametric  tests  are  more  reasonable.  Also,  the  relative  benefit  of  the  robust 
over  the  nonparametric  tests  in  general  is  smaller  for  e  =  0.1;  in  fact,  the  procedure 
which  utilizes  the  dead-zone  limiter  outperforms  the  robust  test  for  $  close  to  unity. 

The  performance  for  least  favorable  e-contaminated  noise  is  shown  in  Figures 
3.11  through  3.14.  The  quickest  detector  for  the  least  favorable  noise  is  exactly  the 
robust  procedure  (by  design),  and  the  graphs  corroborate  this.  For  the  case  of  small 

9Here  x  denotes  the  gain  in  performance  that  is  realized  when  the  robust  procedure  is  used 
compared  to  the  procedure  listed  on  the  graph. 
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Figure  3.8:  Robustness  index  (e  =  0.01),  Gauss-Gauss  noise 
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Figure  3.10:  Robustness  index  (e  =  0.1),  Gauss-Gauss  noise 
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Figure  3.12:  Robustness  index  (e  =  0.01),  least  favorable  t- contaminated  i 
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Figure  3.14:  Robustness  index  (e  =  0.1),  least  favorable  e-contaminated  noise 
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contamination  (e  =  0.01),  the  linear  detector  outperforms  the  nonparametric  tests 
over  a  fairly  large  range  of  'J,  while  requiring  less  than  10%  additional  expected  delay 
in  most  cases  (although  as  VP  becomes  very  small,  the  performance  of  the  linear  test 
deteriorates  quickly).  On  the  other  hand,  observe  that  one  would  likely  prefer  one  of 
the  nonparametric  tests  over  the  linear  test  when  the  contamination  is  heavier. 

Similar  conclusions  can  be  drawn  when  the  least  favorable  total  variation  noise  is 
present;  these  results  are  shown  in  Figures  3.15  through  3.18.  Note  that  the  overall 
shape  of  these  performance  curves  bears  a  striking  similarity  to  those  of  Figures  3.11 
through  3.14.  This  fact  is  not  surprising  when  we  compare  Figure  3.1  and  3.3:  the 
shape  of  the  least  favorable  densities  under  the  two  noise  uncertainty  models  are  quite 
similar,  and  so  one  expects  that  the  performance  of  a  given  test  would  be  comparable 
for  each. 

3.4.2  Illustration  of  the  Saddlepoint  Property 

Recall  that  part  of  the  saddlepoint  property  in  (3.4)  stated  that 

v(9R'JolJil)  =  mm  v{9R’,fo,fi) 

That  is,  when  the  robust  procedure  is  used,  the  minimal  performance  results  when 
the  least  favorable  distributions  are  used.  This  fact  is  illustrated  in  Figures  3.19  and 
3.20.  In  each  case,  we  observe  that  for  a  fixed  uncertainty  class  (i.e,  fixed  0,  cr0,  and 
either  £  or  ^),  the  least  favorable  pair  produces  the  lowest  performance.  Also  notice 
that  as  the  contamination  factors  £  and  8  increase,  the  difference  between  the  robust 
performance  for  the  nominal  and  least  favorable  noises  increases. 
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Figure  3.16:  Robustness  index  (S  =  0.05),  least  favorable  total  variation  noise 
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Figure  3.19:  rj  when  the  robust  procedure  for  the  e-contaminated  class  is  used.  The  type 
of  noise  is  indicated  on  the  graph,  e  =  0.01  (solid)  and  0.1  (dashed). 


Figure  3.20:  rj  when  the  robust  procedure  for  the  total  variation  class  is  used.  The  type  of 
noise  is  indicated  on  the  graph.  S  =  0.05  (solid)  and  0.2  (dashed). 
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3.4.3  77  Versus  Contamination  Level  for  Different  Detec¬ 

tor/Noise  Combinations 

In  Figures  3.21  through  3.26,  fj  is  plotted  as  a  function  of  the  contamination  e  and  6 
for  the  Gauss- Gauss  and  least  favorable  noise  types. 

It  is  immediately  obvious  from  Figures  3.21  and  3.22  that  the  linear  detector  is 
a  poor  choice  when  Gauss-Gauss  noise  is  present  even  for  very  small  contamination 
levels,  and  regardless  of  the  value  of  ’J.  Also  apparent  is  that  the  advantage  of 
the  robust  detector  over  the  nonparametric  procedures  is  greater  when  the  SNR  is 
lower.  However,  the  dead-zone  limiter  outperforms  the  robust  test  for  e  >  0.075  when 
^ dB  =  0  dB.  Finally,  in  Figure  3.21,  notice  that  for  heavy  contamination,  there  is 
little  advantage  in  opting  for  the  robust  procedure  over  either  of  the  nonparametric 
tests.  Each  of  these  tests  shares  the  property,  not  possessed  by  the  linear  detector, 
that  the  observations  are  processed  by  “clipping”  the  larger  samples;  this  property  is 
therefore  important  when  the  occurrence  of  outliers  is  frequent. 

Figures  3.23  and  3.24  show  the  results  for  the  least  favorable  noise  for  the  e- 
contaminated  class.  First  notice  that  rj  for  the  robust  and  linear  detectors  are  the 
same  for  e  — »  0.  This  is  as  expected,  since  for  e  =  0,  the  robust  test  is  the  linear  test 
(i.e.,  no  robustness  is  needed,  since  there  is  no  uncertainty).  However,  the  performance 
of  the  linear  test  falls  off  quickly,  and  is  the  least  desirable  of  all  the  tests  for  heavy 
contamination.  Meanwhile,  the  robust  procedure  outperforms  all  of  the  others  for 
any  level  of  contamination,  a  fact  that  results  from  the  saddlepoint  property.  Again 
observe  that  for  =  0  dB  and  under  heavy  contamination  (e  >  0.1),  there  is  little 
advantage  of  the  robust  procedure  over  the  nonparametric  alternatives. 

The  results  for  the  least  favorable  total  variation  noise  are  shown  in  Figures  3.25 
and  3.26.  As  discussed  earlier,  because  the  least  favorable  densities  for  each  of  the 
two  class  are  similar,  the  resulting  performances  are  also  so.  Therefore,  the  same 
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conclusions  can  be  drawn  here  as  were  in  the  previous  discussion. 


3.4.4  Example 


We  conclude  this  section  with  an  example  designed  to  illustrate  the  utility  of  rj. 
Suppose  that  one  wishes  to  detect  a  shift  from  —6  to  6  in  an  environment  where 
the  noise  is  not  completely  characterized;  instead,  it  is  known  only  to  be  nominally 
Gaussian,  and  is  otherwise  assumed  to  lie  in  the  e-contamination  class  with  e  =  0.1 
(relatively  heavy  contamination).  It  is  of  interest  to  design  a  system  which  guarantees 
a  maximum  rate  of  false  alarms,  regardless  of  what  the  actual  noise  distributions  turn 
out  to  be. 

Assume  that  the  observables  are  of  some  process  which  is  sampled  at  10  kHz,  and 
that  0  =  a o  =  l.  Suppose  that  it  is  required  to  have  no  more  than:  i)  one  false  alarm 
per  hour,  and  ii)  one  false  alarm  per  100  hours  (just  over  4  days).  This  results  in 
Ti  =  3.6  x  107  samples  and  T2  =  3.6  x  109  samples.  The  upper  bound  on  the  expected 
detection  delay 


Di< 


log  Ti 

—  3 

V 


*  =  1,2 


can  now  be  determined  simply  by  computing  fj  for  the  detector  and  noise  distributions 
of  interest  (As  we  observed  previously,  since  rj  «  77,  the  upper  bound  is  actually  a 
good  approximation  of  D ). 

Table  3  compares  the  expected  delays  in  milliseconds  when  using  each  of  the 
four  detectors  in  Gaussian,  Gauss-Gauss,  and  the  least  favorable  noise  with  e  =  0.1. 
Notice  that  the  robust  detector  outperforms  both  nonparametric  tests  in  each  case, 
and  also  that  it  may  not  be  wise  to  use  the  linear  detector  if  there  is  a  large  amount 
of  uncertainty  in  the  noise.  Furthermore,  observe  that  the  additional  delay  one  must 
incur  for  raising  T  by  a  large  amount  (from  1  hour  to  100  hours)  is  relatively  small 
since,  in  the  asymptotic  realm,  the  expected  delay  is  proportional  to  the  logarithm 
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Figure  3.24:  rj  versus  e  for  least  favorable  £-contaminated  noise: 


[SlSTsl 
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of  T. 

Finally,  notice  the  apparent  paradox  that  for  the  linear  test,  the  expected  delay 
for  Gauss-Gauss  noise  is  far  greater  than  that  of  the  “least  favorable.”  Recall  that 
the  least  favorable  densities  were  those  that  minimized  the  asymptotic  performance 
measure  when  the  optimal  processor,  the  log-likelihood  ratio,  was  used  (see  the  sad- 
dlepoint  condition  in  equation  (3.4)).  Therefore,  the  fact  that  the  least  favorable 
noise  produces  the  largest  delays  is  only  guaranteed  for  the  robust  procedure. 


Table  3:  Estimate  of  expected  detection  delay  (ms) 


Gaussian 

Gauss-Gauss 

test 

D 

t2 

Q 

t2 

Tj 

□ 

linear 

0.87 

1.10 

13.47 

17.0 

3.28 

4.15 

sign 

1.53 

1.93 

1.92 

2.43 

2.97 

3.76 

dead-zone 

1.64 

2.08 

2.89 

3.66 

robust 

1.14 

1.45 

1.56 

1.97 

2.62 

3.31 

3.5  Locally  Robust  Quickest  Detection  for  the  e- 
contamination  Class 

In  the  preceding  sections,  the  disorder  was  taken  to  be  a  shift  from  —6  to  6,  where 
6  was  arbitrary.  Of  particular  importance  is  the  case  where  the  disturbance  is  very 
small;  in  other  words,  0  is  close  to  zero.  We  are  therefore  motivated  to  consider  robust 
quickest  detection  procedures  for  the  so-called  local  or  weak  signal  case. 
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3.5.1  Local  Behavior  of  the  Asymptotic  Performance  Mea¬ 
sure 

At  the  disorder  time,  suppose  the  random  variables  undergo  a  shift  in  distribution 
from  f(x  +  8)  to  f(x  —  8).  We  have  seen  that  the  optimal  procedure  is  Page’s  test 
using  g(x)  =  log  ,  and  that  the  asymptotic  performance  is  given  by  the  K-L 

divergence: 


It  is  easy  to  see  that  lim^-^o7/  —  0.  However,  this  tells  us  little  about  the  behavior  of 
the  performance  for  weak  signals,  other  than  that  it  becomes  increasingly  poor  as  the 
magnitude  of  the  disturbance  approaches  zero.  Since  the  best  asymptotic  performance 
is  achieved  when  g  is  the  log-likelihood  ratio,  we  conclude  that  the  performance  will 
also  be  poor  for  arbitrary  g. 

We  want  to  study  the  behavior  of  rj  for  6  near  zero.  A  natural  approach  is  to 
express  rj  as  a  Taylor  series  about  6  =  0  as  follows: 

~  ,  »dv\  ,  02  *v\ 


V  =  v\e=o  +  e 


2  dO 2 


+  ... 


Let  Q  be  the  collection  of  all  nonlinearities  g  satisfying  E{p(X  —  8)  \  f}  <  0  < 
E{p(X  +  6)  |  /}  for  all  8  in  some  neighborhood  of  zero,  with  E{p(X  ±  8)  \  /} 
continuous  at  8  —  0;  then  E{g(X)  |  /}  =  0  for  g  €  Q.  It  is  shown  in  [2]  that 

9 — ►()  (H0 


lim  ||  =  45 
e- o  dd 2 


S  =  S(f,g)  = 


I  g2{x)f(x)dx 


where 
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is  the  well-known  efficacy.  The  local  behavior  of  rj  when  a  particular  nonlinearity  g  is 
used  can  be  approximated  by  the  first  nonzero  term  of  the  Taylor  expansion,  which 
is  the  second  order  term,  yielding: 

if«2  e2S  (3.20) 

Therefore,  the  difference  in  local  asymptotic  performance  for  two  different  g’s  can  be 
evaluated  by  computing  the  efficacy  for  each. 


3.5.2  Computation  of  the  Locally  Robust  Quickest  Detec¬ 
tor 

Since  the  efficacy  describes  the  small  signal  behavior  of  Page’s  test,  the  minimax 
criterion  can  be  applied  directly  to  the  efficacy  in  order  to  obtain  the  weak  signal 
robust  quickest  detector.  Thus,  we  need  to  solve: 

max  min  S(f,  g)  (3.21) 

Again,  it  is  useful  to  determine  a  saddlepoint  solution. 

Proposition  3:  There  exists  a  saddlepoint,  (gRjfffit  °f  (3.21). 


Proof: 

It  is  well-known  that  for  any  fixed  density  f(x),  the  efficacy  is  maximized  by  using 


g(x)  =  a 


=  agio(x), 


Va  0 


where  gi0{x )  denotes  the  locally  optimal  nonlinearity.  A  proof  using  variational  cal¬ 
culus  is  given  in  [8].  Using  this  g(x),  it  is  not  difficult  to  show  (with  the  additional 
condition  that  /(— oo)  =  /( oo)  =  0)  that 


ma x£(f,g)  = 
gey 


f{x)dx  =  IF(f) 


2 
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which  is  just  Fisher’s  information.  10 

~~fl  (sc) 

Thus,  the  saddlepoint  solution  is  (ph,/l),  where  Pb(x)  =  and  fi  is  the 

density  that  minimizes  Fisher’s  information.  The  existence  and  uniqueness  of  such  a 
density  are  demonstrated  in  [4].  I 


Recall  that  the  e-contaminated  noise  class  with  nominal  density  fn  is 
T  =  { f{x )  :  f(x )  =  (1  -  e)fn{x)  +  eh(x ),  heH} 

The  least  favorable  distribution  (minimizing  Fisher’s  information)  is  [4]: 

(1  -  e)fn(x x<x0 
fL(x)  =  <  (1  -  e)fn(x),  x0  <  x  <  X! 

(1  -e)fn(Xl)e-k^\  x>Xl 
where  e  and  k  are  related  via 

fn{x0)  +  /n(a:i)  1 


[X 1 

/  fn(x)dx  + 

J  SCn 


/xq  k  1  £ 

and  where  xq  and  x\  are  the  endpoints  of  the  interval  on  which 

resulting  robust  nonlinearity  is 


/»(*) 


<  k.  The 


_  fr)  -/l(x)  , 

Mx)  =  W  < 


-kj  x  <  x0 

•j 

k ,  x  >  Xi 


f*(x)  J  X0  <  X  <  X1 


3.5.3  The  Locally  Robust  Quickest  Detector  with  Gaussian 
Nominals 

Once  again,  of  particular  interest  is  the  case  where  /n(x)  is  the  Gaussian  density 
function  with  zero  mean  and  variance  cTq,  i.e.  fn(x)  =  </?(#;  cr0).  Here,  notice  that 

<  k 


/n(S) 

X 

/»(*) 

10Note  that  Ip(f)  is  independent  of  a.  Consequently  it  is  the  shape,  rather  than  the  scale,  of  g 
that  determines  the  local  performance. 
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implies  Xi  =  — x0  =  her 2.  The  least  favorable  distribution  is  now 


where 


f.M  =  J 


M^)_2H_k  )=s 
k  1  —  a 


—k,  x  <  — 

|*|  <  kal 

k,  x  >  kal 

It  is  interesting  to  note  that  the  form  of  this  test,  a  clipped  linearity,  is  the  same  as 
that  of  the  large  signal  test  with  one  exception:  the  robust  weak  signal  nonlinearity 
is  not  a  function  of  the  signal  strength  6,  since  it  arises  under  the  assumption  that 
0  0. 

By  using  the  robust  nonlinearity  and  the  least  favorable  distribution,  we  can  now 
compute  the  minimax  efficacy,  which  is  just  We  have 


£ -  E(M)h(x)dx 

=  Izf  C'  +  2P(l-£)etV/;  l~  e.kldx 

a4  J-ktr 2  \/2/Ka  V27T<T  -'fcff2 

=  - — — -  [l  —  2$(— &cr)  —  2ka2<p(ka2)  cr2)|  -f — -e-fc  /2 

a2  L  J  v27rcr 

1  2(1  -£)(l  +  fc2CT2)<p(fccr2;£72)\  2A(1  -  e)  fcV2/2 

“  ft  J 


3.5.4  Comparison  of  Several  Procedures  for  Local  Quickest 
Detection 

The  asymptotic  performance  for  other  detector  and  noise  combinations  can  also  be 
obtained  by  computing  the  efficacy  and  then  applying  (3.20).  We  can  also  define 
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the  robustness  index  %  in  the  same  manner  as  in  Section  4.  Suppose  we  have  two 
procedures,  labelled  A  and  B,  and  let  6  be  some  small  positive  number  close  to  zero. 
The  performance  gain  of  procedure  A  relative  to  procedure  B  is 


Va 


Xa,b  =  — 
VB 


WSa  _  SA 
2 92Sb  £b 


(3.22) 


We  can  recognize  the  final  expression  to  be  the  relative  asymptotic  efficiency  from  the 
classical  hypothesis  testing  literature.  Again,  x  can  also  be  interpreted  as  a  measure 
of  the  loss  or  gain  in  expected  delay  for  one  procedure  relative  to  another  as  described 
in  Section  4. 


In  Figures  3.27-3.32,  the  performance  for  the  weak  signal  scenario  is  plotted  versus 
e.  Three  distributions  from  the  e-contaminated  class  are  considered:  the  Gaussian 
distribution,  the  Gauss- Gauss  mixture,  and  the  least  favorable.  In  each  case,  the 
nominal  density  is  Gaussian  with  zero  mean  and  unit  variance. 

For  each  noise  type,  two  plots  are  given.  First,  the  efficacies  resulting  from  the 
linear,  sign,  dead-zone,  and  robust  detectors  are  computed;  the  equations  for  these 
are  given  in  Appendix  E.  Second,  the  performance  gain  of  the  robust  procedure  with 
respect  to  the  other  three  is  illustrated;  that  is,  Xa,b  is  computed  as  in  equation 
(3.22),  where  procedure  A  is  the  robust  procedure,  and  procedure  B  is,  in  turn,  each 
of  the  other  procedures. 

In  Figures  3.27  and  3.28,  we  see  that  for  Gaussian  noise  the  robust  detector 
designed  assuming  a  contamination  e  outperforms  both  of  the  nonparametric  tests, 
but  not  surprisingly,  it  is  not  as  effective  as  the  linear  test.  Nevertheless,  the  loss  in 
performance  in  using  the  robust  rather  than  the  linear  procedure  is  less  than  10%  for 
£  <  0.1,  the  range  of  interest  in  many  real  world  problems. 

The  results  for  Gauss-Gauss  mixture  noise  with  o\  =  100  are  shown  in  Figures 
3.29  and  3.30.  Again,  the  robust  detector  outperforms  the  nonparametric  alternatives. 
However,  notice  this  time  that  the  performance  of  the  linear  detector  is  very  poor 
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Figure  3.28:  Robustness  index  vs.  e  for  Gaussian  noise 
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Figure  3.30:  Robustness  index  vs.  e  for  Gauss-Gauss  noise,  7  =  100 
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Figure  3.32:  Robustness  index  vs.  e  for  least  favorable  s-contaminated  noise 
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when  any  more  than  a  small  amount  of  contamination  is  present. 

Finally,  the  performance  for  the  least  favorable  noise  is  shown  in  Figures  3.31  and 
3.32.  From  the  the  saddlepoint  condition  of  Proposition  2: 

max r](g;  foL,  flL)  =  i ](gR]  f0L,  fi l) 

and  so  we  expect  that  the  robust  detector  will  outperform  each  of  the  others  for 
any  e;  the  plots  confirm  this  expectation.  Notice  that  for  small  contamination,  the 
performance  of  the  linear  and  robust  detectors  is  close.  This  is  not  surprising,  since  for 
small  £,  the  least  favorable  distributions  will  still  be  close  to  the  nominals.  However, 
the  performance  of  the  linear  detector  falls  off  faster  than  any  of  the  others  as  e  gets 
large,  and  the  potential  gain  in  using  the  robust  procedure  over  the  linear  detector 


increases. 
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3.6  Conclusions 

This  chapter  examines  procedures  for  robust  quickest  detection  when  the  noise  densi¬ 
ties  are  known  only  to  lie  within  some  uncertainty  classes,  each  of  which  was  defined 
in  terms  of  an  allowable  deviation  from  some  nominal  density.  The  robust  detector 
was  derived  by  applying  the  minimax  criterion  directly  to  the  asymptotic  perfor¬ 
mance  measure  for  Page’s  test,  77,  and  it  was  shown  that  77  exhibits  a  saddlepoint 
solution.  It  was  also  shown  that  when  the  robust  processor  is  used,  rj  is  equal  to 
the  Kullback-Leibler  divergence,  and  that  the  least  favorable  densities  are  those  that 
minimize  this  quantity.  Moreover,  a  formal  connection  between  robust  quickest  de¬ 
tection  and  robust  hypothesis  testing  was  established,  namely  the  processor  used  for 
the  former  is  just  the  log-likelihood  ratio  of  the  least  favorable  densities  in  terms  of 
risk.  Thus,  we  were  able  to  apply  previous  results  on  robust  hypothesis  testing  to 
the  present  problem.  This  enabled  us  to  obtain  the  robust  quickest  detectors  for  the 
^-contaminated  and  total  variation  noise  uncertainty  classes. 

The  performance  of  the  robust  procedure  was  compared  to  that  of  several  non- 
parametric  versions  of  Page’s  test  via  the  computation  of  77  versus  both  SNR  and  level 
of  uncertainty.  It  was  shown  that  the  robust  procedures  exhibit  good  performance 
over  a  range  of  noise  distributions  within  the  uncertainty  class  and  outperform  the 
nonpar ametric  alternatives  in  most  cases,  yet  they  are  more  conservative  than  the 
optimal  (when  the  densities  are  known)  procedure.  However,  procedures  which  are 
optimized  for  the  nominal  distributions,  which  in  this  case  were  Gaussian,  suffered 
severe  degradation  in  performance  in  some  instances  when  the  true  distributions  lay 
elsewhere  in  the  uncertainty  class.  For  example,  the  procedure  using  the  optimal  pro¬ 
cessor  for  Gaussian  noise,  the  linear  processor,  was  shown  in  Section  3.4.1  to  produce 
expected  delays  which  were  greater  than  those  of  the  robust  procedure,  in  some  cases 
by  nearly  a  factor  of  eight.  It  was  also  shown  that  in  situations  where  a  fixed  false 
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alarm  rate  is  desired,  the  delay  to  detect  the  disorder  can  be  maintained  at  reasonable 
levels  over  the  entire  class  if  the  robust  procedure  is  used. 

The  locally  robust  quickest  detection  procedure  was  also  derived  by  applying  the 
minimax  criterion  to  the  classical  efficacy,  which  is  directly  proportional  to  the  asymp¬ 
totic  performance  in  the  weak  signal  scenario;  in  this  case,  the  least  favorable  density 
is  that  which  minimizes  Fisher’s  information.  The  performance  of  the  locally  robust 
procedure  was  compared  to  that  of  nonparametric  alternatives  involving  the  sign 
detector  and  dead-zone  limiter.  The  local  version  of  the  robust  quickest  detector  ex¬ 
hibited  good  performance  over  the  entire  noise  class,  outperforming  both  the  linear 
and  nonparametric  procedures  in  most  cases. 

As  a  potential  area  for  future  work,  it  would  be  interesting  to  examine  the  per¬ 
formance  of  the  robust  procedure  when  there  is  a  mismatch  in  the  assumed  noise 
class.  For  example,  one  might  assume  that  e  =  e1}  when  in  fact  the  contamination 
is  e2  7^  £i-  One  would  suspect  that  for  small  mismatch,  there  would  be  only  slight 
degradation.  This  is  an  important  area,  since  the  contamination  factor  is  often  not 
known  exactly,  but  estimated  from  the  data.  This  raises  the  additional  question  of 
how  good  one’s  estimate  of  e  must  be  in  order  to  design  a  robust  test.  Also,  the 
method  for  deriving  the  minimax  robust  procedure  via  the  asymptotic  performance 
measure  could  be  applied  to  multivariate  models.  This  method  does  not  assume  that 
the  process  is  scalar,  and  so  some  results  should  carry  over  directly.  In  Chapter  4, 
we  examine  the  related,  but  more  specific  problem  of  quickest  detection  for  Gaussian 
noise  with  unknown  mean  vector  and  covariance  matrix. 
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3.7  Appendices 


3.7.A  Computation  of  rj  for  Various  Noise/Detector  Com¬ 
binations  Involving  the  E-contaminated  Class 

The  general  procedure  for  computing  rj  is  given  in  the  text.  Below,  the  performance 
is  computed  for  some  specific  cases  involving  the  e-contaminated  noise  uncertainty- 
class.  Throughout,  the  procedure  is  designed  to  detect  a  shift  in  the  mean  from  —6  to 
6 ,  and  the  nominal  distribution  of  the  e-contaminated  class  is  Gaussian  with  variance 
<Tq.  The  constants  Co  and  c\  are  chosen  to  satisfy 

^ =  _L_ 

\  cr0  J  \  cr0  J  1  —  e 

$(lh±l\+Cl$(h±l\  =  — L- 

\  o0  J  \  o0  J  l-£ 


(j2  cr2 

where  bQ  =  log  c0  and  K  —  ^  log  ci-  Note  that  c\  <  1  <  Co,  and  so  bi  <  0  <  b0. 


Gauss-Gauss  Noise,  Robust  Detector  for  the  e-contaminated  Class 

In  this  section,  fj  is  computed  where  the  robust  detector  for  the  e-contaminated  class 
is  used  and  the  noise  is  Gauss-Gauss  with  contamination  factor  e: 

f(x)  =  (1  -  e)<p(x]  (To)  +  e<p(x]  (Ti) 

Here  ip(x;a)  is  the  Gaussian  density  with  variance  cr2;  thus,  the  noise  is  nominally 
Gaussian  with  variance  cr^,  but  is  occasionally  (with  probability  e)  contaminated  by 
impulsive  noise  modelled  as  Gaussian  with  variance  cr2  <Jq. 

Define  the  following: 

f0(x)=f(x  +  6)  fi{x)=f(x  -  9) 

4>o(x)=ip(x  +  0]  a0)  <f>i(x)=(p(x  —  0\  (T0) 

So (x)=<p(a5  +  0;0i)  So(x)=<p(x  -  9]  «Ti) 
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The  robust  nonlinearity  is 


gR(x)  =  log 


hijx) 

Ilo(x) 


logc1;  J  <  Cl 

C!<^<C0 

log  C0)  fr>c0 


Therefore,  we  are  required  to  compute 


rj  =  ui0E {gR(x)  \  fj} 


(A.l) 


where  tv0  satisfies 


E{eo>0 S*(*)  |  /o}  =  1 


Observe  that  the  linearity  of  the  expectation  operator  allows  us  to  write: 

E{e“° ®*(*>  |  /o}  =  (1  -  £)E{et,03RW  |  &,}  +  £E{e“oS«(x)  |  <r0}  (A.2) 


E{5tf(z)  I  fi}  =  (1  “  e)E{^je(a:)  |  &}  +  eE-fo^x)  |  <r0}  (A.3) 

To  compute  E{e“o9R(x)  |  /0},  we  first  compute  Efe"09*^  |  ?o}.  We  have 

E{e"09R(x)  I  Co}  =  /  c“q0(x)dx  +  [  ^ry\  Co {x)dx  +  [  c£?0(x)dx 
J  A\  J (poyTs)  J A$ 

where  Ax  =  <  Ci}  =  {x  <  fex},  A2  =  {cx  <  <  Co}  =  {i>i  <  x  <  b0 }, 

Az  =  >  Co}  =  {x  >  to}-  The  integrals  over  A\  and  A3  are  easy  to  compute.  To 

compute  the  A2  term,  first  let  v  =  ^r;  then: 

L  [«lj]  *{x)dx  =  C  vfe exp  {^(i2 + 2(6  ■  °2iV)x + ®2)} dx 


=  exp 


f  -202 

exP  1  — 5-w  1  -  — r 


*  r\2oJV 1  V  ))  j 
4  +0(1  -  $  +  0(1  - 

01  01 
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Combining  this  with  the  other  two  integrals,  we  have 


E{e"osB(x)  |  ^  b°  6 


+  exp 


-2  e2 


ci 


o  > 

criuj 


C 1 


«  1-^ 


$ 


6o  +  e(l-2^)\ 


V 


Cl 


-  $ 


*1  +  0(1  -  ¥) 


V 


O'! 


Also,  observe  that  we  can  easily  obtain  E{eWoSR^x^  |  <f>o}  by  noting  that 

E{e-o gR{*)  |  _  E{e»o fl«(x)  |  ft}| 

l£Ti=i 


<TX— <T0 


Therefore  using  (A. 2),  we  have  that  u;0  is  the  nonzero  root  of  the  equation 


(l-e)expj 
r  -2 62 


-M2  n  v 

— T-w(l  -w) 


$ 


'6o  +  0(l  -2w)' 


_$ 


+  0(1  —  2t*j) 


Co 


Co 


+  £  exp 


Co 


c o  1 


o  > 


$ 


$ 


V 


Cl 


Cl 


+ 

+  < 


(1  -e)$ 


+  ^  +  (hi  +  0 


c0 


Cl 


(1  -e)$  f  6°  +  c$ 


^0 


0i  > 


(A.4) 


A  bisection  routine  can  be  used  to  obtain  u0:  where  the  search  region  is  restricted  to 
the  region  where  the  right  hand  side  of  (A.4)  is  increasing. 

E{5k(z)  |  /i}  is  computed  similarly  by  first  computing  E {gii(x)  |  Ci}  and  then 
using  the  fact  that 

E {^(s)  |  <f>i}  =  E {gR(x)  | 


The  final  result  is 
E{flrfl(®)  \fi}  =  log  ci 


c0 


Cl 
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+  log  Co 


(1  —  e)$ 


+ 


2  e2 


(l-c) 


$ 


'-b0 +  8' 

'bo -8s 


+  £$ 


—bo-\-8 


cti 


_$ 


12  8 

+  l/ - 

7T  <7  o 
CTl 


(l-£) 


exp 


^0  /  \  CTo 


4"  £ 


$ 


'&o-0' 


+  £ 


CT0 


exp 


jfci  ~  *)2 

2  of 


2ol 


—  exp 


—  exp 


V  °i 

2o20  j 


~(6o  -  8f 
2of 


Finally,  rj  is  obtained  by  using  (A.l). 


Nominal  Gaussian  Noise,  Robust  Detector  for  the  e-contaminated  Class 


The  asymptotic  performance  for  this  case  is  the  same  as  in  the  previous  section,  where 
we  let  £  =  0  in  the  Gauss-Gauss  density  function.  That  is,  rj  =  u>oR{gR(x)  |  where 
ujq  is  the  unique  nonzero  root  of 


<£*  (  — ]+<*  (  — 


+  exp 


^0  / 
-2  82 


-u > 


(1  -") 


CT0 


$ 


'b0  +  $(l-2u>y 
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b1  +  0(l-2uy 


CTo 


CTo 


and 


E{ss(z)  |  /i} 
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'h  -8s 
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V  7T  CTo 
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Gauss-Gauss  Noise,  Linear  Detector 

If  the  noise  is  assumed  to  be  the  nominal  Gaussian,  the  nonlinearity  is  g(x)  = 
However,  it  was  shown  in  [2]  that  rj  is  invariant  to  changes  in  scale.  Therefore,  for 
simplicity,  we  eliminate  the  constant  multiplier  and  use  g(x)  =  x  instead. 
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For  this  case,  u)0  is  the  unique  nonzero  root  of 

E{e^  |  foy  =  (1  _  e)E{e“x  \  <f>0}  +  eE{eux  |  ?0} 

We  can  alternatively  write  this  equation  as 

|  /„}  =  (1  -  e)M*0(u;)  +  eMCo(u;) 

where  M^,0  and  M?0  are  the  moment  generating  functions  for  the  Gaussian  densities 
(f> o  and  Co,  respectively,  which  can  be  found  in  a  number  of  references  (for  example, 
[10]).  Thus,  u>0  is  the  nonzero  root  of  the  equation 

1  =  (1  —  e)  exp  {—Ouj  +  iwV2}  +  e  exp{— 0ui  +  r2}  (A.  5) 

This  is  a  transcendental  equation  which  can  be  again  solved  using  a  bisection  routine 
restricted  to  a  suitable  interval.  The  E{g(x)  |  /i}  term  is  simply 

E{fir(cc)  |  /}  =  (1  -  e)E{x  |  <£i}  +  eE{x  \  ?i} 

=  (1  —  5^0  H-  £0  —  0 

and  so  rj  =  uq0  for  this  case. 

Gauss-Gauss  Noise,  Linear  Detector 

If  the  noise  were  simply  Gaussian  with  variance  a2  (e  =  0),  co0  in  (A. 5)  could  be  solved 

for  explicitly,  namely  ojo  =  H ■  Note  that  u>0  >  0,  as  expected.  The  performance  is 

then  7?  =  Alternatively,  one  could  simply  compute  the  performance  directly  using 
<ro 

the  K-L  divergence. 

Least  Favorable  e-contaminated  Noise,  Linear  Detector 

The  nonlinearity  for  this  procedure  is  g(x )  =  We  are  required  to  compute 


V  =  w0E {g(x)  |  fLl} 
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where  &0  is  the  non-zero  root  of  the  moment  generating  function  equality 


E{ew°^)  |/£o}  =  l 


(A.6) 


Recall  that 


/lo(x)  =  < 


(1  -e)(f>o(x),  ^<co 


t-Co 


and 


(l-e)^i(s),  J>ci 

1  (1  -  ff)ci^o(x),  &  <  Cl 


/liO) 

The  left  side  of  (A.6)  is 

(1  -  e)  [  f  evxf0(x)dx  +  —  f  evxfx{x)ds 

\jJ —oo  Co  Jbo 

where  z/=-^.  We  will  solve  the  two  integrals  separately  (in  each  case,  this  is  done  by 
completing  the  square).  The  first  one  is 

[  euxf0{x)dx  =  [  JL — expj  —  — ^  x2  +  2(0  —  cr2i/)z  +  62  1  dx 

J- oo  J —oo  y2'RO‘o  l  2<J0  L 

-2^0V  +  ^l^  1  exp  {-^(,  +  («-,W)2}ix 


=  exp 


$ 


2oo 

exp  | +  ^o^2} 

{202  1 

— 2-tu(u>  —  1)  j  $ 


j  V^7rcro 
(  bp  +  (0  -  crgj/)' 

V 

'00  +  0(1  -2o;)^ 
CTO  ) 


The  second  integral  is  derived  in  a  similar  manner.  We  have: 


/  evx fi{x)dx  =  exp  <J  cu(u;  -f  1)1  $ 

Jbo  [  ^0  J 


-fro  T  $(1  +  2o>) 


Thus,  the  value  of  cu0  is  determined  by  obtaining  the  nonzero  root  of 


(1  -£)exp<  — w 
I  ^0 


°"o  \ 


r  20^ 
l  <*l 

j.  1  /2«2  1  .  /-So 

Co  l  00  J  V 


cr0 

+  0(1  +  2u>) 

Co 


=  1 
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The  term  E{(/(x)  |  fi,i}  is  given  by 

'26  r°° 


Zu  f°°  Zu 

(l-e)  —  /  xf1(x)dx  +  c1~2  xf0(x)da 

[<Jq  Jb\  C7q  J ~qo 

Compute  the  two  integrals  separately.  The  first  is: 


20  [<* 
al  Jbr 


X 


exp 


-(*  -  ey 


2  02 


iV*  r°° 

<Jq  Jbi  — 


t  exp  1  2a$ 


\/27rcro 

~yi}ds+2iC 


dx 


2  Oo 
26  /°°  y 


«v^a„eXPl2  ^ 


(zh±r 


<?0 


<7  0 


2  6 

+  \ - exp 

V  7T  cr0 


dy 


2oq 


where  the  change  of  variables  y  =  x  —  6  was  used.  Similarly,  the  second  integral  is 


."U  (h±£ 


cr0 


1 2  6 

7T  (J0 


exp 


-(*1  +  A)2 

2al 


Combining  the  two,  we  have 

E(X®)  I  f* }  =  (l-e) 


f202 

Rl 


$ 


W-- 

V  7T  (70 


exp 


-6i  + 

<70  / 

~(fci-g)2 

2ctq 


Ci$ 


X  +  es 


cr0 


Ci  exp 


-(&i  +  fl)2 

2cro 


tj  can  now  be  computed. 


3.7.B  Computation  of  fj  for  Various  Noise/Detector  Com¬ 
binations  Involving  the  Total  Variation  Class 

The  general  procedure  for  computing  rj  is  given  in  the  text.  Below,  the  performance 
is  computed  for  some  specific  cases  involving  the  total  variation  noise  uncertainty 
model.  Throughout,  the  procedure  is  designed  to  detect  a  shift  in  the  mean  from 
—6  to  6,  and  the  nominal  distribution  of  the  total  variation  class  is  Gaussian  with 
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variance  ctq .  The  constants  cq  and  are  chosen  to  satisfy 


c°  $  f  jl  ^ 


1  +  Cq 


1 


c0 


1  +  Co 


c0 


Cl  +  1  $t~bl  +  6 


1  +  Cl 


c0 


1  +  Cl 


Co 


6 

2 

8 

2 


where  bo  =  logco  and  Z»i  =  ^  logci.  Note  that  cq  <  1  <  Ci,  and  so  bo  <  0  <  b\. 


Gaussian  Noise,  Robust  Detector  for  the  Total  Variation  Class 

In  this  section,  rj  is  computed  where  the  robust  detector  for  the  total  variation  class 
is  used  and  the  noise  is  Gaussian  with  variance  a 2. 

Let  ip(x]  a)  denote  the  Gaussian  density  with  variance  a2,  and  define  the  following: 

<j>o{x)  =  v(x  +  0;  Co)  (f>i{x)  =  <p(x  -  0;  a0) 

The  robust  nonlinearity  is 


gR{x)  = log 


We  are  required  to  compute 


h i(f) 
ho(x) 


logco, 

losf£|  ■ 


log  Cl, 


4>i  ^  ~ 
To  2  Cl 


V  =  o;0E{^(x)  |  ^i} 


(B.l) 


where  u;0  satisfies 

E{e«ote(«)  |  =  1 

First  we  need  to  compute  E{e“offfilxl  |  <f>o}.  However,  notice  that  this  has  already 
been  obtained  in  Section  with  the  exception  that  c0  and  C\  (and  also  b0  and 

b\ )  are  switched.  Therefore  o;0  is  the  nonzero  root  of  the  equation 

i  = 


+  exp 


c0  J 
(-20* 
l  Cq 


Co 


w(l  —  u>) 


$  (  6l  +  g(1~2^)N |  _q(  bo  +  Ojl  ;b.2) 


Co 


Co 
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A  simple  bisection  routine  can  be  used  to  obtain  ujq,  where  the  search  region  is 
restricted  to  the  region  where  the  right  hand  side  of  (B.2)  is  increasing. 

The  expression  for  E{(/i*(a;)  |  cf>i}  was  also  obtained  in  Section  B.1.1.  with  the 
aforementioned  constants  swapped.  The  result  for  the  present  case  is 

EtaWUi}  =  logcotf^— +  logc,« 


\  ^  /  V  O*  ). 

.  pt  r  f  -(to  -  «)2t 


■(h  - 

2al 


Finally,  rj  is  obtained  by  using  (B.l). 


Least  Favorable  Total  Variation  Noise,  Linear  Detector 

The  nonlinearity  for  this  procedure  is  g(x )  =  ~r  ■  We  are  required  to  compute 


V  =  w0E{£r(x)  |  fL  1} 


where  u>o  is  the  non-zero  root  of  the  moment  generating  function  equality 


E{s""»M  I  /„}  =  1 


(B.3) 


Recall  that 


T^(MX)  +  <fi(x))>  t  -c° 


ho(x)  =  <  <£0(2:), 


<  £  <  ci 


-h(Mx)  +  ^  ci 


+  4>{x)),  %<Co 


hi(x)  =  <  ^i(x),  00  <  £  <  Cl 

i^r( Mx )  +  <t>(x))>  £  >  ci 
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The  left  side  of  (B.3)  is 

7- 7 —  /  evx((f)0(x )  +  <f>i  (x))dx  +  [  evx(j)0{x)dx  +  — p —  f  evx{(f)o{x)  +  (j)x[x))dx 
1  +  C0  J-oo  Jbo  1  +  Cl  Jb! 

These  integrals  can  be  solved  by  collecting  the  powers  of  the  exponentials  and  com¬ 


parts.  The  result  is 


E{#(x)  |  hi}  = 
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and  77  can  be  obtained  directly. 


3.7.C  Variance  of  the  Least  Favorable  Distributions 

In  Section  4  the  effective  signal  to  noise  ratio  was  defined  as 

t-  E{x|/l} 

y/Vv{X  |  /i} 

The  variance  of  the  least  favorable  distributions  is  simply  given  by 

Var{X  |  fa}  =  E{X 2  |  fa}  -  (E{X  |  fa})2 

The  expressions  for  E{X  \  fa}  and  E{X2  |  fa}  are  given  below  for  each  of  the  two 
noise  classes. 

e-contamination  Model 

E{X\fa}  = 


+ 


/oo 

xfa(x)dx 

-oo 


(1  -  e)cr0 

>/§? F 

0(1  -e) 


eXP 


-(h-0)2 


2cr2 


$ 


-h+f 

v0  , 


Ci$ 


—  Ci  exp 

(“-*) 


-(&!  +  0)2 

2o"o 


/OO 

x2fa(x)dx 

-OO 

^  f  cro(0  —  &i )  f—  (bi  +  6)2 

=  Ci  ^ — £  (  \/5tt  exp  { 


+  (1~£) 


o~o  (0  +  i>i) 

v^ir 


exp 


i&X  -  W 

2v$ 


+  ( e‘- 2  +  a02)$ 

) 

+  (02  +  v2o)$ 


'h+jr 

-6o  +  0N 
.  ^0  ; 
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Total  Variation  Model 


/oo 

*/«(»)* 

-oo 


1  +  c0  {  [  V  ao  J  \  ao  J . 

°o  J— (&o  — ^)2|  j  ~{bo 0)2 


+  -^-\e  *(— — 

1  +  C1  l  V  °o 


.  0-0 

+  2?  exp 


<h  -  er 


+  exp 


*  >KV)-*(V)i 

+  ^[exp{  p 


-bx-0 


-jbi  ±oy 

2o-q 


-ih-oy 


/oo 

x2fL1(x)da 

-oo 


+  T-p{Z^}-^“p{d^1 


cr0(6  +  b\ ) 

+  “  27“  CXP 


-{h-oy}  ,  a0(&i  -  0)  (-{h+0) 

2a20  j+  2t r  6XP  {  2a02 


+  (^2  +  ^o2)  * 


&!  -0 


&o-0 


-(to  -  ey 


+  £  (6o  +  g)exp  . >  —  (fei  +  0)  exp 


-(*1  -  ^)2 
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3.7.D  Performance  Computations  for  Several  Noise  Distri¬ 
butions 

Below  are  plots  of  D  versus  logT  for  several  noise  distributions  and  quickest  detec¬ 
tion  procedures.  The  plots  were  obtained  via  the  Markov  approximation  technique 
described  in  Chapter  2.  Estimates  of  rj  are  obtained  by  measuring  the  slope  of  the 
performance  curves  for  large  T  and  taking  the  inverse.  The  particular  detectors  that 
were  used  are  indicated  on  the  graphs. 

All  values  of  rj  and  rj  agree  within  2%,  and  most  are  identical  to  an  accuracy  of 
three  decimal  places.  Therefore,  the  approximation  77  «  rj  is  valid. 


Table  D.l:  Gaussian  noise 


SNR  =  0  dB 

test 

linear 

2.000 

1.999 

0.200 

0.199 

sign 

1.139 

1.139 

0.126 

0.126 

dead-zone 

1.493 

1.493 

0.161 

0.161 

robust,  e  =  0.1 

1.520 

1.522 

0.155 

0.153 

robust,  e  =  0.01 

1.822 

1.812 

0.193 

0.192 

robust,  8  =  0.2 

1.495 

1.495 

0.136 

0.137 

robust,  6  =  0.05 

1.723 

1.721 

0.179 

0.177 
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Table  D.2:  Gauss-Gauss  noise,  g  =  0.01 


SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

0.389 

0.393 

0.093 

0.094 

sign 

2.006 

2.006 

0.243 

0.243 

dead-zone 

2.603 

2.605 

0.309 

0.309 

robust,  e  =  0.01 

2.995 

3.018 

0.369 

0.366 

t 

i — • 

D.3:  Gauss-Gauss  noise,  £  =  0.1 

SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

0.730 

0.732 

0.138 

0.139 

sign 

3.001 

3.001 

0.976 

0.976 

dead-zone 

3.057 

3.057 

1.135 

1.135 

robust,  £  =  0.1 

3.040 

3.060 

1.198 

1.191 

Table  D.4:  Least  favorable  e-contaminated  noise,  e  =  0.01 


SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

1.511 

1.515 

0.188 

0.191 

sign 

1.240 

1.240 

0.142 

0.142 

dead-zone 

1.568 

1.568 

0.176 

0.176 

robust,  £  =  0.01 

1.780 

1.780 

0.204 

0.203 
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Table  D.5:  Least  favorable  e-contaminated  noise,  e  =  0.1 


SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

1.285 

1.298 

0.189 

0.190 

sign 

1.537 

1.537 

0.202 

0.202 

dead-zone 

1.526 

1.526 

0.203 

0.203 

robust,  e  —  0.1 

1.579 

1.567 

0.233 

0.233 

Table  D.6:  Least  favorable  total  varia 

■;ion  noise,  6  =  0.05 

SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

1.408 

1.407 

0.192 

0.191 

sign 

1.353 

1.353 

0.162 

0.162 

dead-zone 

1.615 

1.615 

0.190 

0.190 

robust,  8  =  0.05 

1.742 

1.741 

0.214 

0.213 

Table  D.7:  Least  favorable  total  variation  noise,  8  =  0.2 


SNR  =  0  dB 

SNR  =  -10  dB 

test 

computed 

measured 

computed 

measured 

linear 

1.286 

1.285 

0.189 

0.188 

sign 

1.538 

1.538 

0.211 

0.211 

dead-zone 

1.522 

1.522 

0.198 

0.198 

robust,  8  =  0.2 

1.580 

1.583 

0.241 

0.240 
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Figure  3.33:  Performance  for  Gaussian  noise:  ^  =  0  dB. 
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Figure  3.35:  Performance  for  Gauss-Gauss  noise:  e  =  0.01,  7  =  100,  =  0  dB. 


Figure  3.36:  Performance  for  Gauss-Gauss  noise:  e  =  0.01,  7  =  100,  ^  =  —10  dB. 


Figure  3.38:  Performance  for  Gauss-Gauss  noise:  e  =  0.1,  7  =  100,  $  =  —10  dB. 
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Figure  3.40:  Performance  for  least  favorable  e-contaminated  noise:  £ 
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Figure  3.44:  Performance  for  Gaussian  noise:  ^ 


BISIS 
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Figure  3.46:  Performance  for  least  favorable  total  variation  noise:  8  =  0.05,  $  =  —10  dB. 
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Figure  3.48:  Performance  for  least  favorable  total  variation  noise:  8  =  0.2,  4' 
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3.7.E  Efficacy  Computations  for  the  Weak  Signal  Case 

The  efficacy  is 

=  (Jg^c)f{x)dxf 
I  g2(x)f(x)dx 

The  computation  of  the  two  integrals,  although  tedious,  is  straightforward.  Therefore, 
only  the  final  expressions  are  given  for  each  case. 


Linear  Detector,  Gaussian  Noise 


Sign  Detector,  Gaussian  Noise 


S  = 


7T  (Jf. 


Dead-zone  Limiter,  Gaussian  Noise 


7 T<7n 


1  -1 


$ 


CO 


Robust  Detector,  Gaussian  Noise 


£  _ _ {°o  2  [2$  (kop)  —  1]  j2 _ 

2k2$  (- kcr0 )  —  y/^ke~k2<r q/2  +  cTq  2  [2$  ( ka0 )  —  1] 

Linear  Detector,  Gauss-Gauss  Noise 


£  = 


(1  -  e)a§  +  eof 
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Sign  Detector,  Gauss-Gauss  Noise 


_  2  ri  —  s  e 

£  = - +  — 

7T  L  Co  al 


Dead-zone  Limiter,  Gauss-Gauss  Noise 


{l=S-e~<P  l2°l  _L  S-e-<P/2a-l\2 

l  q-p _  o-i _ J_ 


Robust  Detector,  Gauss-Gauss  Noise 


Tlien 


h  =  2k2$(-ko0) 


2  ( —ko  n 

I2  =  2&2$  - £ 

V 


fee-*=^o2/2  +  a-2  [2$  (A:cro)  -  1] 


fce-fcVo/2<r?  +  2$ 


{<70-*[2(l-t)$(far,)  +  fe*  (g)-!]^ 


(1  —  e)/i  +  e/2 

Linear  Detector,  Least  Favorable  e-contaminated  Noise 

E  =  (~2e~k2<r°/2  jc1  “  £)  ~Y  +  “  *K2}  +  -o2  [2$(Ar<r0)  -  1] 

Sign  Detector,  Least  Favorable  e-contaminated  Noise 


E  =  -2(l-eY 

TTCg 


Dead-zone  Limiter,  Least  Favorable  e-contaminated  Noise 


.2  1  2 

(1— e)e—<1  /i70 


5  =  J  \/f<rofce  fc2°r°/2+'7r<ro 


[2  ^(1-0  cfcVg/2  c-kd 

7T  CTq 


d  <  A:Cq 


d  >  A:Cq 
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Chapter  4 


Robust  Quickest  Detection  Under 
Mean  &  Covariance  Uncertainty 

4.1  Introduction 

In  this  chapter,  we  continue  our  investigation  of  robust  quickest  detection  procedures. 
In  the  previous  chapter,  robust  procedures  were  investigated  for  the  case  where  the 
noise  distributions  were  known  only  to  lie  in  some  uncertainty  class.  Here,  we  consider 
the  case  where  the  noise  is  multivariate  Gaussian,  and  where  the  uncertainty  exists 
in  the  mean  vector  and/or  covariance  matrix. 

For  the  classical  known-signal  hypothesis  testing  problem  in  Gaussian  noise,  it 
is  well  known  that  the  procedure  that  maximizes  the  probability  of  detection  for 
a  given  false  alarm  probability  (i.e.,  that  satisfies  the  Neyman- Pear  son  criterion)  is 
the  matched  filter,  which  is  the  filter  that  maximizes  the  output  signal-to-noise  ratio 
(SNR),  followed  by  a  comparator  [9].  It  is  perhaps  then  not  surprising  that  the 
matched  filter  is  also  the  optimal  processor  for  the  quickest  detection  problem  when 
the  noise  is  Gaussian:  the  log-likelihood  is  the  optimal  processor  in  the  sense  of  Lorden 
(see  Chapter  2),  and  for  the  Gaussian  case  the  log-likelihood  function  is  exactly  the 
matched  filter. 
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The  optimal  processor  when  the  noise  is  Gaussian  is  linear,  and  is  based  only 
on  the  first  and  second  order  statistics.  If  the  noise  is  non-Gaussian,  the  optimum 
processor  will  in  general  be  nonlinear,  and  hence  the  matched  filter  processor  will  be 
suboptimal.  Unfortunately,  the  derivation  of  the  optimum  processor  for  non-Gaussian 
noise  is  not  always  straightforward,  particularly  if  the  noise  cannot  be  characterized 
exactly.  By  comparison,  the  matched  filter  maximizing  the  SNR  is  a  “common, 
simple  and  generally  well-founded  engineering  technique  [3],”  and  therefore  may  be 
an  attractive  option  even  when  the  noise  is  not  Gaussian.  1 

Much  work  has  been  done  in  the  area  of  robust  matched  filtering  (for  example, 
see  [4]  and  references  therein,  and  for  a  general  treatment,  see  [8]);  of  particular 
interest  in  this  chapter  is  the  work  on  minimax  robust  discrete-time  matched  filtering 
of  Verdu  and  Poor  [10].  The  main  objective  of  this  chapter  is  to  formally  establish 
the  connection  between  robust  matched  filtering  and  robust  quickest  detection  in 
multivariate  Gaussian  noise.  Once  this  is  done,  we  will  see  that  all  of  the  techniques 
from  the  former  can  be  used  directly  to  obtain  solutions  for  the  latter. 

Section  2  contains  the  main  result  of  this  chapter.  After  stating  the  problem 
formally,  it  is  shown  that  the  minimax  robust  matched  filter  is  exactly  the  optimal 
processor  in  the  sense  of  Lorden.  This  is  done  by  applying  the  minimax  criterion 
directly  to  the  asymptotic  performance  measure.  Different  types  of  signal  and  noise 
uncertainty  are  investigated  in  Section  3.  It  is  shown  that  many  of  the  results  from  [10] 
showing  how  to  obtain  the  robust  matched  filters  can  also  be  used  to  derive  the  robust 
quickest  detector.  The  asymptotic  performance  measures  for  the  robust  procedures 
are  computed,  and  several  examples  are  provided.  In  Section  4,  two  related  issues 
are  addressed.  First,  an  alternative  “minimax  tuning”  method  which  appears  in  [1]  is 
shown  to  be  equivalent  to  our  approach.  Finally,  the  computation  of  the  asymptotic 
performance  is  discussed  for  the  more  general  case  when  the  noise  is  non-Gaussian. 


1An  exception  is  when  the  noise  is  impulsive.  See  [3]  for  details. 
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4.2  General  Solution  for  the  Robust  Quickest  De¬ 
tector 

Consider  the  following  disorder  problem.  The  multivariate  real-valued  independent 
random  variables  Xi,  x2, . . .  are  observed  sequentially,  where  x,-  is  generated  under  Ho 
for  i  =  1, ...  ,m  —  1  and  under  Hi  for  i  =  m,m  +  1,  —  The  two  hypotheses  are 
multivariate  Gaussian: 

Ho  :  Xj  ~  JV(— s,  2) 

Hi  :  X;~.V( s,E) 

where  s  €  3tk,  E  6  3Jfcxfe,  and  E  is  positive  definite.  The  means  are  chosen  to 
be  symmetric  for  simplicity,  but  without  loss  of  generality.  Furthermore,  it  is  known 
only  that  (s,  E)  €  S  X M,  where  S  and  M  are  independent  signal  and  noise  uncertainty 
classes,  respectively.  2  In  this  work  we  consider  classes  of  the  form: 

S  =  {s  :  || s  -  s0||  <  £} 

M  =  {E  :  ||E  -  E0||  <  e} 

Thus,  the  uncertainty  is  modelled  as  a  deviation  from  the  nominal  parameters  s0  and 
Eo-  The  particular  norms  used  will  be  discussed  in  the  next  section.  3 
Page’s  test  is  defined  here  exactly  as  in  Chapter  2;  namely,  the  statistic 

Sn  =  max  {S„_i  +  g(x),  0} 

is  recursively  computed,  and  a  disorder  is  declared  when  the  stopping  time 

N  =  inf  {n  |  Sn  >  h } 

2Notice  that  this  is  equivalent  to  saying  that  (/o,/i)  6^oX  Fu  where 

To  =  {/  =  /(x)  =  |^r|_5  exp{-|(x+  s)TS_:l(x-f  s)}  ,  s  €  S,  S  €  Af\ 

Ti  =  {/  :  /(x)  =  exp{-|(x- s)TS_1(x  -  s)}  ,  s  6<S,  E  €  Arj 

In  this  chapter,  the  former  notation  will  be  used  so  that  it  is  clear  that  the  uncertainty  lies  only  in 
the  mean  and  covariance. 

3It  is  not  difficult  to  verify  that  S  and  A f  are  also  convex. 
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occurs,  where  h  >  0  is  some  prespecified  threshold.  For  Gaussian  noise,  the  optimal 
processor  is  linear;  that  is,  the  processor  g(x)  =  hTx,  for  some  h  E  The  SNR  for 
a  single  snapshot  is 


(h,s) 


(h,Eh)  rv ’  ’  ' 

and  a  direct  application  of  the  Cauchy- Schwarz  inequality 


l(h, s)|2  <  (h,Eh)(s,E  xs) 


reveals  that  p  is  maximized  when  h  satisfies 


Eh  =  ks  (4.1) 

where  k  is  any  nonzero  real  constant;  in  this  case,  p(h;s,E)  =  (s,E-1s).  4  Equation 
(4.1)  defines  the  well-known  discrete-time  matched  filter,  the  processor  that  maxi¬ 
mizes  the  SNR.  Notice  that  p(h;s,E)  is  the  same  under  either  Ho  or  Hi,  since  we 
assumed  that  the  respective  mean  vectors  are  symmetric. 

The  minimax  solution  for  the  robust  matched  filtering  problem,  (hj?,  (sx,  Ex,)),  is 
the  solution  of 

max  <  min  p(h;s,  E)> 

heWl(s,E)eSxJ\r'‘  J 

where  (sx,,Ez,)  denotes  the  least  favorable  pair  in  S  x  Af,  and  Exhn  =  ksl-  Here 
H  =  $tk,  but  in  general  hi  can  be  an  arbitrary  Hilbert  space.  In  [8],  it  is  shown  that 
a  saddle  point  solution  exists  for  this  problem,  that  is: 


max/j(h;sL,EL)  =  p(hfl;sL,Ei)  =  min  p(hjj;  s,  E)  (4.2) 

h  UK  (s,r)e<SxA/ 

This  allows  the  maximization  and  minimization  to  be  determined  separately  rather 
than  jointly.  The  following  lemma  (reworded  slightly),  which  appears  in  [10],  charac¬ 
terizes  the  saddle  point  solution: 


4(x,  y)  =  xTy  is  the  inner  product  of  x  and  y. 
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Lemma  2:  (h#,  (s^Ej^))  is  a  saddle  point  for  the  robust  matched  filtering  problem 
if  and  only  if 

1.  E^Iir  =  Si, 

2-  |(si,hB)|  <  |(s,hji)|,  Vs  €  S, 

3.  0  <  (hi)  (Sl  —  S)hi),  VS  6  Af 

We  now  show  that  the  robust  processor  for  Page’s  test  is  gjt(x)  =  c  h^x,  where  c  is 
any  positive  real  constant,  thereby  establishing  a  formal  connection  between  minimax 
robust  matched  filtering  and  quickest  detection. 

Proposition  4:  If  (Iir,  (s^,,  S*,))  is  the  minimax  solution  for  the  matched  filtering 
problem,  then  (gR,  (sx,,  Sr,))  is  the  asymptotic  minimax  solution  for  the  quickest  de¬ 
tection  problem,  where  E^Iir  =  ksl,  k/0,  and  gii(x)  =  ch^x;  c  >  0. 

Proof: 

Let  /io  ~  Af(— s,  S),  and  fa  ~  W(s,  S),  and  assume  k  —  1  for  convenience  (and 
without  loss  of  generality,  since  k  can  otherwise  be  incorporated  into  s^,).  We  would 
like  to  show  that  (4.2)  implies 

maxrj(g-,sL,HL)  =  r}(gR-,sL,'EL)  =  nun  -q{gR\  s,  S)  (4.3) 

see  (s,s)e5xW 

Since  r]  is  maximized  when  g  is  the  log-likelihood  ratio  (cf.  Chapter  2),  the  left 
equality  is  achieved  when  #(x)  =  log  jMjQ  =  2s£E£xx  =  2h^x.  However,  recall  that 
i.)  when  g(x)  is  the  log-likelihood,  tj  =  rj,  and  ii.)  rj  is  invariant  to  scale  changes. 
Thus,  the  left  equality  is  also  achieved  for  y(x)  =  ch^x  where  c  is  any  positive  real 
constant. 

Now  we  need  to  show  that  the  right  equality  holds  when  (si,,  Ex,)  is  the  least 
favorable  pair  for  the  robust  matched  filtering  problem.  To  do  this,  first  recall  the 
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definition  of  the  lower  bound  rj  (cf.  Chapter  2)  for  Page’s  test  implemented  with 
processor  g(x): 

rj=u0E\g(x)\  H1]  (4.4) 

where  uj0  is  the  non-zero  root  of  the  moment  generating  function  equality 

E  [exp{u;o5r(x)}  |  H0]  =  1 

Let  /0  ~  fif(— s,  E)  and  fi  ~  jV(s,  E).  When  the  robust  processor  is  used,  u)0  satisfies 


1  =  J  eu°9RW  fo(x.)dx  =  J  evTx/0(x)dx 


where  v=cu)0h_R.  Now  observe  that  this  last  expression  is  just  the  vector  moment 
generating  function  as  a  function  of  v.  Therefore, 

1  =  exp{— vTs  +  -vtEv} 
z 

Taking  the  log  of  both  sides,  rearranging  terms,  and  substituting  back  in  for  v,  we 


co;0h^s  =  ic2u;oh£Ez,hij 


of  which  the  nonzero  solution  is 


2h^s 

ch^Eihfj 


E  [<7i?(x)  |  Hi]  =  ch£s 


Therefore,  the  lower  bound  (4.4)  is 


V(9R',S,  E)  =  2 


h*s  =  o  l(hfl,s)l2 

h^Ehi?  (h r,  Ehfj) 


(Notice  that  this  is  independent  of  the  scale  factor  c,  as  expected). 


Chapter  4:  Robust  Quickest  Detection  Under  Mean  &  Covariance  Uncertainty  113 


The  right  expression  in  (4.3)  can  now  be  lower  bounded  as 


min 

{s,z)eSxAf 


V(9R‘,  s,  E) 


> 


min 

(s,x)eSxAf 


2  l(hfl,s)|2 
(h*,Eh*) 


2  4^(h*;sis) 

(s,z)eSxN 


2p(hi?;  s l,  Si) 


(4.7) 


where  the  last  expression  follows  from  (4.2),  and  (si,  Ei)  is  the  least  favorable  pair 
for  the  robust  matched  filtering  problem.  Now 


p(hH;Si,Ei)  =  {sl,T,l  sL) 

=  1/(/u,/ol) 

=  (4-8) 

where  /(•,  •)  is  the  K-L  divergence.  Thus,  (4.7)  and  (4.8)  imply  that 

min  Tf(gjft;  s,  E)  >  rf(gR;  sL,  Ei)  (4.9) 

(s  ,z)eSxN 

Conversely,  since  the  least-favorable  pair  lies  in  S  x  Af,  we  also  have  that 


mm  i/forji;  s,  E)  =  min  {  nun  s,  E),  t;(^b;  sL,  Si) 

(S,2)eOxA/  ^(S,E)#(Si„Ej>)  j 

<  v(,9r>  si,  El) 


(4.10) 


Finally,  (4.9)  and  (4.10)  together  imply 


min  u(5b;  s,  E)  =  77(^5  Si,  Ei) 
(s.ejgSxJv 


and  so  (4.3)  holds. 


I 


We  have  seen  that  for  the  Gaussian  case,  the  least  favorable  pairs  can  be  obtained 
via  the  matched  filtering  formulation.  Now,  we  would  like  to  state  some  of  the  results 
for  particular  types  of  signal  and  noise  uncertainty  classes. 
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4.3  Particular  Solutions  for  Various  Uncertainty 
Classes 

4.3.1  Signal  Uncertainty 

In  this  section,  three  signal  uncertainty  classes  are  considered,  all  of  which  are  based 
on  the  lp  norm;  it  is  assumed  that  the  noise  covariance,  S0,  is  fixed  and  known.  In 
[10],  necessary  and  sufficient  conditions  for  (h^,  (s l,  S0))  to  be  a  saddle  point  solution 
of  the  minimax  robust  matched  filtering  problem  (i.e.,  to  satisfy  (4.2))  are  given. 
Proposition  1  of  the  previous  section  confirms  that  these  same  conditions  can  be  used 
to  obtain  the  optimal  processor  for  the  quickest  detection  problem.  The  asymptotic 
performance  measures  for  the  nominal  and  robust  procedures  are  determined  below. 
Finally,  a  simple  example  illustrates  the  utility  of  the  previous  results  from  robust 
matched  filtering  in  the  design  of  robust  quickest  detectors. 

Types  of  Signal  Uncertainty 

Since  the  robust  processor  is  that  filter  which  is  matched  to  the  least  favorable  signal, 
we  have  that  hn  =  Sq1sl  regardless  of  the  choice  of  the  class  S.  The  three  types  of 
uncertainty  classes  are  listed  below,  along  with  the  necessary  and  sufficient  conditions 
derived  in  [10].  5 

•  Mean-absolute  distortion  (li  norm):  Si  =  {||  s  —  s0  ||i<  A} 

For  i  =  0, 1, . . . ,  k  —  1: 

SLi  -  s0i  -  6*sgn(hLi) 

where 

Si  =  0  if  \hi,i\  <  M=  max  | h^j \ 


5 For  the  most  part,  the  notation  we  use  here  is  the  same  as  in  [10]. 
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and 

k- 1 

z-0 

is  satisfied.  If  the  noise  is  uncorrelated,  that  is  So  =  diag(crg, . . . ,  then  the 

following  closed-form  expression  can  be  derived: 


h0i,  I  hoi  |<  C 

hu  =  { 

Csgn(hoi),  |  h0i  |>  C 


where  C  satisfies 

X>?(l h «  I  -c)+  =  a 

i= 0 

•  Mean-square  distortion  fa  norm):  <S2  =  { 1 1  s  —  s0  ||2<  A} 


hL  =  (So  +  ally1  so 


Sl  =  So  —  cr^liL 


where  cr2  is  obtained  by  solving 


A  =  a2  ||  (So  +  cr2J)-1s0  || 


•  Maximum- absolute  distortion  (Iqo  norm):  —  {||  s  —  Sq  oo<  A} 


For  %  =  0, 1, . . .  j  k  —  1: 


z  b'Li  ^  ^ 

$Li  — 

soi  +  A,  hi,i  <  0 


If  the  noise  is  uncorrelated,  then 


s0t  —  A,  A  <  s0; 

•SLz  —  <  0,  — A  <  Sq;  ^  A 


«SOt  ~t*  «S()t  ^ 
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Unfortunately,  a  general  closed-form  solution  is  only  available  for  the  class  <52 ;  for 
the  others,  a  closed-form  solution  is  available  only  for  uncorrelated  noise.  It  has  been 
suggested  that  numerical  techniques  might  be  used  to  obtain  hx,  for  <5i  and  S3.  This 
point  is  discussed  further  at  the  end  of  this  section. 


Asymptotic  Performance  Under  Signal  Uncertainty 


We  would  like  to  compare  the  robust  quickest  detector  to  the  nominal  version  (that 
is,  the  one  which  is  optimal  for  the  nominal  parameters  (so,  So))  for  different  values 
of  s.  To  do  this,  we  compute  rj  for  each  case  of  interest,  as  shown  below. 

In  the  proof  of  Proposition  1,  the  lower  bound  on  asymptotic  performance  for  the 
robust  procedure  (gR  (x)  =  hjx)  when  the  true  parameters  are  (s,  E)  was  derived  in 
the  proof  of  Proposition  1;  the  result  is  given  in  (4.6).  This  same  derivation  applies  to 
the  case  where  the  assumed  operating  point  is  (s^Eo),  but  the  true  pair  is  (s2,S0); 
one  simply  makes  the  substitutions  gR(x)  5i(s)=h^x,  h#  *—  hi=Eo  1Si  and  s  *—  s2 
in  (4.6).  The  result  is 


vidi'i  s2j  So)  —  2 


l(hi,s2)| 


(4.11) 


u/  (h^SoM  v  ' 

Thus,  rj  for  the  robust  test  operating  at  the  nominal  point  is  given  by  simply  substi¬ 
tuting  Sr  — »  Sj  and  s0  — >  s2.  This  gives: 


rj(gR;  s0,  E0)  =  2 


Similarly,  rj  for  the  nominal  procedure  when  the  least  favorable  signal  is  present  is 
obtained  when  go(x)  =  hjx,  where  s0  — ►  Si  and  s r  — >  s2  in  (4.11);  we  have: 


77(50;  sL,  Sq)  =  2 


so  S0 1sl 

So  So1s0 


In  Chapter  2,  we  saw  that  when  the  log-likelihood  ratio  is  used  to  process  the 
data,  then  77  =  fj,  and  rj  is  equal  to  the  Kullback-Leibler  (K-L)  divergence.  When 
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/o  ~  M{— s,  E)  and  f\  ~  .V(s,  E),  it  is  easy  to  verify  that 

/(/1;/o)  =  2stE-1s 

Thus,  77(20;  s0,  E0)  =  2sq  Eq  1s0  and  rj{gL\ sL,  E0)  =  2s£Eq 1sL. 

Example 

Here,  we  present  an  example  which  graphically  illustrates  the  decision  regions  for  each 
type  of  signal  uncertainty,  following  the  general  design  procedures  detailed  earlier.  It 
is  desired  to  design  a  robust  quickest  detection  procedure  where  the  parameters  are 
nominally 

3  ^ 

and  £0  == 

1  / 

with  an  uncertainty  parameter  of  A  = 


•  s  €  Si 

ho  =  £0 1s0  = 

Choose  C  such  that 

10  (0.3  -  C)+  +  (1  —  C)+  ==  A 
When  A  =  i,  C  =  Thus, 


and  S£,  =  Eohi, 


We  can  now  compare  the  asymptotic  performance  for  the  nominal  and  robust 
procedures  when  the  observations  are  Af(±SL,  £0).  Using  the  results  shown  earlier 
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in  this  section,  we  have 


v(go]  sl,  S0)  =  2  x 


(3  1) 


(3  1) 


10  0 


10  0 


=  2.063 


v{dL]  sn,  S0)  =  2  x  (3  0.5) 


10  0 


=  2.300 


Therefore,  for  the  least  favorable  signal  Sjr,,  the  robust  quickest  detector  outper¬ 
forms  the  nominal  procedure,  as  expected. 


•  s  £  S2 

First,  solve  for  a2,  where 


^2||(So  +  ^/)-1So||2  =  ^ 


2  10+0-5 


As  pointed  out  in  [10],  the  left  side  is  monotone  increasing  in  a2.  Therefore,  it  is 
easy  to  iteratively  determine  a2  (for  example,  via  a  bisection  routine):  the  result 


for  A  =  |  is  a2  =  0.8079.  Thus 


0.278 

0.553 


and  s l  = 


2.775 

0.553 


For  this  signal  uncertainty  class,  T}(g0;  S.&,  S0)  =  2.021  and  rj(gL]  sl,S0)  =  2.152. 
•  s  G  S3 


We  have  directly  that 


3- A 
1  -  A 
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and  so 


hi  = 


(  0.25  ^ 

(  °-5 


Finally,  rj(g0 ;  s^,  E0)  =  1.645  and  rj(gL',  s l,  So)  =  1.750. 


Notice  that  for  fixed  A,  both  77(^0;  S£,  E0)  and  Tj(gL]SL,Y,0)  decrease  with  each 
class.  This  fact  is  not  surprising  since  Si  C  S2  C  S3.  We  see  that  the  asymptotic 
performance  is  inversely  proportional  to  the  amount  of  assumed  uncertainty,  a  fact 
that  agrees  with  intuition. 

The  uncertainty  regions  for  the  above  examples  are  graphically  illustrated  in  Fig¬ 
ures  4.1  through  4.3.  Let  y(x)  =  hTx,  where  h  is  an  arbitrary  linear  processor.  Since 
E[<7(x)  I  Hi]  =  -E[<7(x)  I  Ho]  >  0,  observe  that  the  decision  regions  in  each  case 
are  separated  by  the  hyperplane  hTx  =  0  (the  hyperplane  is  simply  a  line  in  these 
examples).  For  each  signal  class,  the  separating  hyperplanes  for  the  nominal  and  ro¬ 
bust  procedures  are  shown.  The  slopes  of  the  boundaries  are  dependent  on  the  noise 
covariance  E0.  For  example,  when  E0  =  I,  the  hyperplanes  are  perpendicular  to  the 
line  connecting  — So  and  So  (or  —Sl  and  s*,);  on  the  other  hand,  when  the  condition 
number  of  the  covariance  is  not  unity  (the  eigenvalues  are  not  all  the  same),  as  in  our 
example,  the  hyperplanes  will  be  “skewed”  with  respect  to  this  line. 

We  now  examine  the  design  tradeoffs  involved  in  choosing  A.  In  Figure  4.4, 
the  asymptotic  performances  of  the  nominal  and  robust  procedures  are  compared  as 
a  function  of  A  for  the  class  £2  (similar  plots  can  be  obtained  for  the  other  two 
classes).  The  nominal  pair  (s0,  E0)  is  the  same  as  in  the  previous  example.  Each 
plot  of  rj  versus  A  is  labelled  with  a  pair  of  the  form  (g,  s),  which  indicates  that  the 
procedure  with  processor  g(x.)  was  used,  but  that  the  true  mean  vector  was  s  (there 
is  still  no  uncertainty  in  the  covariance  matrix).  Notice  that  the  robust  procedure 
implemented  when  the  true  signal  is  the  least  favorable,  s l,  outperforms  the  nominal 
procedure  for  all  choices  of  A.  However,  this  is  at  the  price  of  reduced  performance 
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Figure  4.1:  Signal  uncertainty  class  Si. 


Figure  4.2:  Signal  uncertainty  class  <$2* 


Figure  4.3:  Signal  uncertainty  class  £3. 
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when  the  signal  is  in  fact  So- 

Now  suppose  that  a  robust  quickest  detection  procedure  is  designed  for  a  signal 
distortion  of  A',  but  that  the  true  distortion  is  A;  this  scenario  is  depicted  in  Figure 

4.5.  Observe  that  the  procedure  with  A'  =  1.0  exhibits  performance  that  is  almost 
identical  to  the  optimal  procedure  (designed  for  distortion  A)  when  the  0.75  <  A  < 

1.5,  but  that  the  performance  degenerates  with  respect  to  the  optimal  and  nominal 
procedures  for  A  <  0.5.  Similarly,  the  performance  of  the  procedure  designed  with 
A'  =  0.25  is  reasonable  for  A  <  0.5,  but  declines  compared  to  the  optimal  procedure 
over  the  rest  of  the  interval.  This  illustrates  a  robustness  of  a  different  sort:  namely, 
that  good  performance  can  be  obtained  using  the  robust  procedure  even  if  there  is 
some  mismatch  in  the  assumed  and  actual  levels  of  distortion. 

In  Chapter  3,  the  performance  of  two  procedures  were  compared  by  computing 
the  “robustness  index,”  which  was  the  ratio  of  the  rj’s  for  each.  This  approach  is 
also  used  in  [10]  in  the  context  of  minimax  robust  matched  filtering  in  discrete  time, 
and,  while  not  included  here,  an  analogous  analysis  can  be  used  to  compare  robust 
quickest  detectors. 

Obtaining  sx,  and  hx,  When  the  Noise  is  Correlated 

For  classes  <Si  and  S 3  when  the  noise  is  correlated  (So  is  not  diagonal),  the  least 
favorable  signals  can  only  be  written  as  a  function  of  the  robust  filter;  that  is,  Sx,  = 
sL(hL).  In  these  cases,  Sl  and  hx,  must  be  determined  numerically.  One  possible 
approach  to  accomplish  this  is  outlined  below. 

For  both  <Si  and  S3,  the  uncertainty  region  is  bounded  by  the  hyperplanes  de¬ 
scribed  by  the  equation  ||s  —  s0||  =  A.  For  example,  when  s0  =  (3  1)T  and  the  l0 0 
norm  is  used,  the  region  is  just  a  square,  as  shown  in  Figure  4.3.  In  Figure  4.6,  a 
blowup  of  c>3  is  shown  for  A  =  |.  Notice  that  the  region  is  simply  the  intersection  of 
the  four  half-planes  (which  are  just  lines  in  this  case)  Xi  >  2.5,  Xi  <  3.5,  x3  >  0.5, 
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Figure  4.4:  Asymptotic  performance  computations  for  the  signal  class  S2- 
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and  x2  <  1.5.  Let 


A  = 


— 1 
1 
0 
0 


0 

-1 
1  J 


/ 


-2.5 


\ 


and  b  = 


3.5 

-0.5 


V  »•*  ) 


Then  the  uncertainty  region  is  equivalently  described  by  the  inequality  Ax  <  b, 
where  x  >  0.  In  general,  when  x  €  the  uncertainty  region  is  described  by  the 
intersection  of  2 p  hyperplanes.  It  is  not  difficult  to  see  that  S i  can  also  be  described 
in  this  manner. 


As  discussed  earlier  in  this  section,  the  robust  processor  is  simply  the  log-likelihood 
ratio  for  the  least  favorable  distributions,  namely,  those  whose  mean  vectors  are  ±S£. 
It  was  also  shown  that  S£,  is  chosen  to  minimize  the  K-L  divergence,  /(/i,/o)  = 
2stSq  1s,  where  s  €  S.  Therefore,  the  least  favorable  signal  can  be  obtained  by 
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solving  the  following  constrained  minimization  problem: 

minimize  stEq1s 
subject  to  Ax  <  b,  x  >  0 

This  is  the  well-known  quadratic  programming  problem,  which  has  been  studied  thor¬ 
oughly.  In  [2],  it  is  shown  that  this  problem  can  be  solved  using  the  modified  simplex 
method  by  applying  the  Karush-Kuhn-Tucker  conditions,  which  are  necessary  and 
sufficient  for  an  optimal  solution  to  exist.  In  [5],  the  solution  is  determined  by  apply¬ 
ing  a  descent  procedure  to  the  dual  problem.  The  details  of  these  approaches  can  be 
found  in  the  references  listed. 

4.3.2  Noise  Covariance  Uncertainty- 

In  this  section,  we  assume  that  the  signal  So  is  fixed  and  known,  but  that  the  noise 
covariance  E  lies  in  the  uncertainty  class  Af.  This  situation  might  arise  in  a  detection 
scheme  designed  to  indicate  the  presence  of  a  disorder,  where  the  observables,  x, 
are  snapshots  obtained  via  multiple  sensors.  As  discussed  previously,  if  the  sensor 
covariance,  E,  is  known,  the  optimal  processor,  g(x),  is  the  log-likelihood  ratio.  On 
the  other  hand,  when  E  is  not  known,  a  natural  approach  is  to  investigate  alternative 
robust  processors. 

Types  of  Noise  Uncertainty- 

In  [10],  two  noise  classes  of  the  form 

Af  =  { E:  ||S  —  E0||  <  £, S  >  0} 


are  considered: 
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•  J\fx  =  {E  :  || E  —  So||  <  s,  E  >  0},  where  ||-||  is  the  unit  matrix  norm; 
that  is,  any  norm  with  the  property  that  ||  J||  =  1 

h  l  —  Sl'so 
Ex,  =  So  +  si 

•  J\f 2  =  {S  :  || S  —  So||  <  s,  E  >  0},  || ’ll  is  the  Euclidean  norm; 
that  is,  \\A\\l  =  Efco1  [(A&] 

hi  =  (So  +  si)  xs0 

S  l  —  So  + 

e  =  ^nl|hL||2 

The  above  noise  model  can  arise  in  a  variety  of  applications,  such  as  in  radar  or 

sonar  problems  where  the  signal  to  be  detected  is  embedded  in  noise  which  is  not 

completely  characterized.  For  example,  a  typical  underwater  environment  consists  of 
uniform  ambient  background  noise,  superimposed  with  other  sources  such  as  impulsive 
or  nonstationary  noise  components,  and  possibly  some  additional  signals  which  are 
not  of  central  interest.  In  such  an  environment,  the  sensor  covariance,  S,  can  be 
modelled  as  follows.  Let 

S  =  Ejt  + 

Here  Sfc  is  the  “known”  (or  adequately  estimated)  component  of  the  covariance,  con¬ 
sisting  of  the  contributions  of  the  uniform  background  noise  as  well  as  measurement 
uncertainty  (usually  taken  to  be  i.i.d.),  while  Eu  is  the  “unknown”  component,  which 
accounts  for  all  of  the  other  sources  of  interference.  The  robust  techniques  discussed 
above  can  be  directly  applied  here  by  defining  the  noise  uncertainty  class 

A/”  =  {S  :  ||S  —  Sfc ||  <  e} 

where  e  is  selected  sufficiently  large  to  account  for  possible  values  of  Eu. 
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Asymptotic  Performance  Under  Noise  Covariance  Uncertainty 

Denote  the  assumed  and  true  noise  covariance  matrices  as  Si  and  S2,  respectively. 
The  procedure  for  computing  77,  given  below,  is  similar  to  that  used  in  the  case  of 
signal  uncertainty. 

Suppose  Page’s  test  is  implemented  using  S'i(x)  =  hfx,  where  hi=S{’1s0.  Define 
v=w0hi.  The  moment  generating  function  equality  (4.5)  in  this  case  is 

1  =  jffcexp{vTx}(27rE2)_1/2exp  j-^(x  +  s0)TS21(x  +  so)}dx 

=  expj— vTs0  +  ^vrS2v| 

where  the  last  expression  is  just  the  multivariate  Gaussian  moment  generating  func¬ 
tion.  Taking  the  log  of  both  sides,  substituting  for  v,  and  solving  for  o;0,  we  get 

_  2hfs0 
W°  “  hfS2hx 

Also  noting  that  E{0i(x)  |  Hi}  =  hfs0,  we  have 

f(si;  so,  S2)  =  woEfoM  |  H,}  =  2  T  (4-12) 

\nx,^2ni; 

When  the  noise  assumption  is  correct  (i.e.,  Ex  =  E2),  rj  =  r/  as  discussed  previ¬ 
ously,  and  then 

so,  Si)  =  2sq  Ex  1s0 

The  performance  of  the  nominal  and  robust  procedures  can  be  evaluated  by  replacing 
Ei  with  Eo  and  S^,  respectively. 

Example 


Below  is  an  example  illustrating  the  procedure  for  determining  the  robust  processor 
under  covariance  uncertainty.  As  in  the  example  of  the  previous  section,  the  nominal 
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parameters  are 


3  I  10  0 

and  So  — 

1  l  0  1 


and  now  assume  that  the  covariance  uncertainty  parameter  is  e  =  2. 


£  £  Af\ 

We  have  directly 


£l  = 


10  +  £  0 
0  1  +  £ 


12  0 


hL  =  Hr1  So  = 


As  with  the  signal  distortion  case,  the  asymptotic  performance  can  be  computed 
for  the  robust  and  nominal  procedures.  For  the  least  favorable  covariance,  we 


v  ,  ,  I  I2  , 

V\9o,  so>  2  „ _,_j  —  1.770 

s0  so 


r](gR;  s0,  £/,)  =  2So£i1s  =  2.167 

As  expected,  the  robust  quickest  detector  outperforms  the  nominal  version  when 
the  covariance  is  least  favorable. 


•  2  6  Af  2 


First,  we  have 


hr,  = 


12  0 


hi||’  =  {hi,  hi)  =0.1736 
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In  Figure  4.7,  the  effect  of  mismatch  in  the  assumed  and  true  covariance  con¬ 
tamination  for  the  class  M2  is  examined;  let  s'  and  £,  respectively,  denote  these 
quantities.  The  solid  lines  indicate  the  asymptotic  performance  when  the  nominal 
procedure  (V  =  0)  and  the  optimal  robust  procedure  (s'  =  s)  are  implemented.  Ob¬ 
serve  that  the  test  designed  for  a  high  uncertainty  of  s'  =  5  suffers  a  significant  loss 
of  performance  if  s  is  small;  in  particular,  for  e  <  1,  it  would  be  better  to  use  the 
nominal  procedure.  The  small  distortion  performance  can  be  improved  by  instead 
selecting  s'  =  1,  but  this  is  at  the  expense  of  performance  when  the  distortion  level 
is  high.  Thus,  analogous  to  the  signal  distortion  example  of  the  previous  section,  we 
conclude  that  a  small  amount  of  error  in  selecting  s'  is  tolerable. 


4.3.3  Signal  and  Noise  Uncertainty 

In  this  section,  we  discuss  the  problem  of  designing  the  robust  procedure  for  the  case 
when  uncertainty  lies  in  both  the  signal  and  noise  covariance;  that  is,  (s,  S)  £  5  x  M. 
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Figure  4.7:  Design  tradeoffs  in  selection  of  e'  for  the  signal  class  A'V 


In  general,  the  robust  filter  h#  and  the  least  favorable  pair  (s^,,  E l)  can  be  obtained 
by  applying  the  three  conditions  of  Lemma  1  in  Section  4.2. 

Since  the  classes  S  and  Af  are  independent,  s l  and  E l  can  be  determined  sepa¬ 
rately  whenever  they  can  be  written  independently  of  h#.  The  robust  processor  can 
be  obtained  in  two  steps.  First,  determine  the  least  favorable  covariance  E^,.  Second, 
determine  the  least  favorable  signal  S£,  for  the  nominal  parameter  pair  (s0,Ei).  For 
example,  for  the  case  when  (s,  E)  E  S2  X  Af2}  the  robust  processor  5ii(x)  =  h£x 
where 

hL  nr  (j]0  +  (e  +  O'*)/)  S0 

with  A  =  cr2s  ||  hx,  || .  For  this  case,  it  is  interesting  to  notice  that  the  robust  processor 
accounts  for  both  the  signal  and  noise  uncertainty  by  adding  a  white  noise  component 
(proportional  to  A  and  e)  to  the  nominal  covariance  So* 

Suppose  that  the  parameters  are  assumed  to  be  (si,£i),  but  in  fact  they  are 
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(S2,  E2).  Thus,  the  test  that  is  implemented  uses  the  processor  gi(x)  =  h^x,  where 


hi=E11s1.  Define  v=w0hi-  The  left  hand  side  of  (4.5)  is  now 
^exp{vTx}(27rE2)-1/2exp  j-i(x  +  s2)TE21(x  +  s2)jdx  =  exp  j-vTs2  +  ^vTE2vj 

Solving  for  the  non-zero  root,  we  get 

_  2h^s2 
W°  “  hfE2hx 

The  other  term  is  E{pi(s)  |  /}  =  h^s2;  therefore 

v  \\  _  o  l(h!» Sa)l 


v(9ii  (s2,  S2))  —  2 


(hi,S2h:) 


4.4  Extensions 


4.4.1  Relationship  to  the  “Minimax  Tuning”  Approach 

In  the  preceding  sections,  the  minimax  criterion  was  directly  applied  to  the  approxi¬ 
mate  asymptotic  performance  measure  77,  and  the  optimal  processor  was  shown  to  be 
the  minimax  robust  matched  filter.  In  this  section,  we  briefly  discuss  an  alternative 
method  for  deriving  the  robust  quickest  detector.  It  turns  out  that  the  solutions  for 
both  problems  are  the  same,  although  the  approaches  differ. 

Suppose  that  we  wish  to  implement  Page’s  test  using  the  linear  processor  g(x)  = 
sTE-1x,  and  notice  that  this  is  just  the  log-likelihood  ratio  for  testing  between 
Af(— s,  E)  and  Af( s,  E).  6  Thus,  the  test  is  designed  assuming  that  (s,  E)  are  the 
true  parameters.  However,  suppose  that  the  actual  mean  vector  is  0.  Recall  that 
in  Chapter  2,  we  defined  the  average  sample  number  (ASN)  of  a  CUSUM  procedure 
with  initial  score  z  to  be  Nz(6).  In  [1],  the  following  Wald  approximations  to  the 
ASN  are  given. 

w.(«) » ff,(g)  =  ■■■■ ■— ~  +-*  ,  0#o 

6The  constant  multiplier  has  been  omitted  for  convenience. 
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-A/o(0)  «  ^0(0)  =  — 

where  p  =  E  [p(x)  |  9]  and  cr2  =  Var  [g(x)  |  B\.  It  is  straightforward  to  show  that 
p  =  sTE~19  and  o2  —  sr£-1s. 

Define  the  parameter 

b_  p  _ 

~  0  ^  (s^S-is)^ 

Now  A/* o(^)  and  A/*o(0)  can  be  directly  related  as 

Under  iii,  b  >  0  (since  >  0),  and  o{b)  is  the  worst  expected  delay  in  detection 
when  the  disorder  occurs;  similarly,  under  Ho  ,  b  <  0,  and  is  the  mean  time 

between  false  alarms.  Since  the  goal  is  to  minimize  the  former  and  maximize  the 
latter,  one  would  like  to  choose  s  such  that  b2  is  maximized:  this  is  because  when 
b  >  0,  the  b2  term  in  the  denominator  dominates  A^o(i>),  while  when  b  <  0,  the 
exponential  term  in  the  numerator  dominates.  Notice  that 

sTE-'s  (h,s)  Eh=s 

which  is  the  same  as  the  value  of  rj  (less  a  factor  of  two).  The  minimax  tuning  approach 
[1]  is  to  maximize  b2  for  the  least  favorable  mean  vector  6.  However,  since  maximizing 
b2  is  equivalent  to  maximizing  rj,  the  results  from  minimax  robust  matched  filtering 
can  also  be  applied  to  the  minimax  tuning  approach  to  obtain  specific  solutions  for 
the  robust  processor. 

The  above  approach  can  also  be  extended  to  include  the  case  of  covariance  uncer¬ 
tainty.  As  in  Section  4.3.2,  let  £i  and  £2  denote  the  assumed  and  true  covariances, 
and  suppose  that  the  true  signal,  So,  is  known.  Page’s  test  is  then  implemented  using 
the  processor  ^(x)  =  £^1x.  This  results  in  p  =  Sq  £^1s0  and  a2  =  Ef1£2£^1So. 
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Thus, 


J.2  _ 

SqS^so 

l(hi,s0>|2 

^So  (hijSahi) 

Again,  the  equivalence  between  maximizing  b2  and  77  in  equation  (4.12)  is  apparent. 


4.4.2  Computation  of  fj  For  Non-Gaussian  Noise 

As  mentioned  in  Section  4.1,  in  some  cases  it  may  be  desirable  to  implement  a  quickest 
detection  procedure  designed  using  a  maximum  SNR  criterion  even  if  the  noise  is  non- 
Gaussian.  Here,  the  computation  of  rj  which  appears  in  the  proof  of  Proposition  1  is 
generalized  to  include  this  case. 

Suppose  /o(x)  and  /i(x)  are  multivariate  non-Gaussian  densities,  and  let  M0(v) 
denote  the  moment  generating  function  of  /o(x).  Now  rj  is  obtained  via  (4.4)  and 
(4.5)  as  before;  these  expressions  are  repeated  here  for  convenience: 

^  =  WoE[5(x)|jH1]  (4.13) 

1  =  J  evTx/0(x)dx,  v=u0hR  (4-14) 

Now  observe  that  this  last  expression  is  just  the  vector  moment  generating  function 
as  a  function  of  v.  Therefore,  u0  is  determined  by  solving  the  equation 

Afo(v)  =  Mq{jjJo^r)  =  1 

either  directly  or  numerically,  and  then  (4.13)  can  be  determined  in  a  straightforward 


manner. 
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4.5  Conclusions 

In  this  chapter,  robust  quickest  detection  procedures  were  investigated  for  the  case 
of  multivariate  Gaussian  observables  with  uncertain  means  and/or  covariances.  This 
statistical  model  arises  in  several  areas,  including  radar,  sonar,  and  other  multisensor 
applications. 

The  most  significant  contribution  establishes  a  formal  connection  between  robust 
quickest  detection  and  robust  matched  filtering.  This  allows  one  to  apply  previous 
results  on  the  latter  to  the  quickest  detection  problem  when  uncertainty  in  the  first 
and  second  order  statistics  exists.  Explicit  solutions  for  the  discrete-time  robust 
matched  filter,  derived  in  [10],  were  used  to  separately  obtain  the  robust  quickest 
detectors  for  uncertainty  in  the  mean  vector  (signal)  and  noise  covariance. 

When  the  observables  are  Gaussian,  the  robust  quickest  detector  is  optimal  in  the 
sense  that  the  asymptotic  performance  measure  fj  is  maximized  for  the  least  favorable 
mean  and  covariance.  It  was  also  pointed  out  that,  when  the  noise  is  non-Gaussian, 
the  same  techniques  may  be  used  to  obtain  a  robust  detector,  where  the  goal  is  simply 
to  maximize  the  SNR  over  the  uncertainty  class.  In  either  case,  expressions  for  rj  were 
derived  which  characterize  the  worst  case  performance  of  the  robust  procedure. 

Simple  examples  were  given  to  illustrate  the  design  process.  In  some  cases,  such 
as  when  the  noise  is  uncorrelated,  the  design  of  the  processor  that  is  robust  to  both 
signal  and  noise  uncertainty  can  be  carried  out  by  separately  determining  the  least 
favorable  mean  and  covariance.  In  the  more  general  case,  the  solution  can  be  obtained 
by  iteratively  solving  the  set  of  equations  in  Lemma  1. 

There  are  several  interesting  directions  for  future  work.  First,  the  robust  quickest 
detection  problem  could  be  further  generalized  in  a  Hilbert  space  setting,  as  is  done 
in  [8]  for  the  robust  matched  filtering  problem.  Second,  it  would  be  useful  to  obtain 
the  continuous  time  robust  quickest  detector.  Finally,  it  would  be  useful  to  determine 
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explicit  solutions  for  the  signal  distortion  problem  when  the  noise  is  correlated.  It  was 
shown  that  this  problem  can  be  reformulated  as  a  quadratic  programming  problem, 
for  which  two  iterative  approaches  were  mentioned.  A  direct  solution  to  this  problem 
would  be  useful  in  both  robust  quickest  detection  and  robust  matched  filtering,  and 
so  this  area  is  worthy  of  additional  attention. 
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Chapter  5 


Quickest  Detection  in 


Decentralized  Decision  Systems 


5.1  Introduction 

In  recent  years ,  there  has  been  an  increasing  interest  in  the  area  of  distributed ,  or 
decentralized ,  detection.  A  distributed  detection  system  contains  two  basic  entities. 
The  first  is  a  collection  of  local  detectors ,  each  of  which  consists  of  a  sensor  followed 
by  some  type  of  decision  rule.  The  second  is  a  central  processor,  or  fusion  center , 
which  processes  the  local  decisions  and  produces  a  final  decision.  The  use  of  such 
decentralized  decision  schemes  is  motivated  by  the  reduction  in  channel  bandwidth 
that  can  be  achieved  (and  hence  a  reduction  in  system  cost),  and  also  by  the  need 
in  some  situations  for  the  sensors  to  be  separated  by  great  distances.  In  addition, 
a  decrease  in  the  complexity  of  the  decision  procedure  at  the  fusion  center  can  be 
realized.  However,  by  reducing  the  data  locally  instead  of  utilizing  the  complete  data 
set  at  the  central  processor,  some  performance  is  also  sacrificed. 

Early  work  in  distributed  detection  focused  on  one-step  procedures;  that  is,  where 
the  decision  is  based  on  a  single  finite  sample.  In  [16],  the  classical  Bayesian  approach 
to  detection  theory  is  extended  to  the  case  of  distributed  detection.  The  optimization 
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of  the  local  and  fusion  procedures  is  studied  in  [16,  5,  13],  and  is  extended  in  [7,  10] 
to  the  case  where  the  local  decisions  are  correlated.  More  recently,  the  distributed 
detection  problem  which  incorporates  sequential  schemes  at  the  local  detectors  [8] 
and  fusion  center  [17,  9,  18]  have  been  considered. 

In  this  chapter,  we  consider  the  problem  of  detecting  disorders  using  a  distributed 
system.  To  date,  there  has  been  little  work  in  this  area.  In  Teneketzis  [14]  and 
Teneketzis  and  Varaiya  [15],  the  decentralized  quickest  detection  problem  is  formu¬ 
lated  by  defining  a  Bayes  cost  function  which  penalizes  false  alarms  before  the  disor¬ 
der  and  large  delays  in  detection  after  the  disorder.  The  disorder  is  modelled  using 
a  Markov  chain,  where  the  conditional  probability  of  the  jump  occurring  at  time 
i  +  1  given  that  it  did  not  occur  at  time  i  is  some  fixed  value,  which  is  presumably 
known  or  inferred  from  previous  data.  It  is  shown  in  [15]  that,  for  the  case  where 
the  cost  function  is  not  separable  with  respect  to  the  local  decisions,  the  local  thresh¬ 
olds  are  the  solutions  to  a  set  of  coupled  dynamic  programming  equations  and  are 
time-varying.  A  separable  cost  function  is  also  considered;  this  results  in  fixed  local 
thresholds,  although  with  a  longer  delay  in  detection. 

In  this  work,  we  examine  several  fusion  rules  for  the  case  where  the  local  detectors 
simply  consist  of  an  integrator  followed  by  a  comparator  with  a  fixed  threshold.  As 
stated  in  Chapter  2,  the  disorder  time  is  taken  to  be  unknown.  Several  of  the  fusion 
procedures  we  consider  assume  knowledge  of  the  signal  strengths  before  and  after  the 
disorder;  however,  we  also  examine  some  procedures  which  are  applicable  when  this 
information  is  not  available. 

In  Section  2,  the  decentralized  detection  problem  is  stated  precisely,  and  the  no¬ 
tation  used  here  (in  addition  to  that  of  Chapter  2)  is  presented.  In  Section  3,  the 
three  distributed  procedures  under  consideration  are  derived.  The  first  of  these  is  the 
ML  optimal  test,  which  is  shown  to  admit  a  recursive  form.  The  second  is  a  version 
of  Page’s  test,  similar  to  the  above  test,  but  which  requires  less  computation  at  each 


Chapter  5:  Quickest  Detection  in  Decentralized  Decision  Systems 


138 


iteration.  The  third  test  is  a  procedure  which  is  suitable  for  the  important  case  where 
the  magnitude  of  the  disorder  is  unknown.  In  Section  4,  we  show  how  the  Markov 
approximation  approach  of  Chapter  2  can  be  modified  to  compute  the  performance  of 
the  distributed  system.  In  Section  5,  a  simple  procedure  for  choosing  the  thresholds 
for  the  local  decision  rules  is  derived  based  upon  an  asymptotic  performance  mea¬ 
sure.  It  is  shown  that  not  only  can  the  thresholds  be  easily  computed,  but  that  the 
resulting  performance  is  optimal  for  practical  purposes;  the  latter  point  is  the  subject 
of  Section  7.  In  Section  6,  the  performance  of  each  of  the  tests  is  computed  for  strong 
and  weak  jump  magnitude  scenarios.  Finally,  in  Section  8,  the  choice  of  blocklength 
of  the  local  detectors  is  investigated.  It  is  shown  that,  in  general,  the  more  samples 
used  in  the  local  decisions,  the  lower  the  performance  and  channel  bandwidth  cost; 
however,  in  the  small  signal  case,  it  is  actually  advantageous  to  use  a  larger  blocksize 
from  a  performance  standpoint.  1 

5.2  Problem  Statement 

The  decentralized  detection  system  under  consideration  is  shown  in  Figure  5.1.  Here 
{s^(i)}"_i  is  the  sequence  of  samples  received  by  sensor  t  up  to  time  n,  where  £  = 
1, 2, . . . ,  L,  and  L  is  the  total  number  of  sensors.  The  sampling  frequency  is  fixed  at 
f,  =  1  /Tt.  The  disorder  is  modelled  as  a  step  change  in  the  mean  of  the  observables; 
that  is: 

H0  :  xt(i)  =  ni(i),  i  =  1,2, . . .  ,m  -  1 

Hi  :  xi(i )  =  nt(i )  +  s*,  i  =  m,  m  +  1, . . . 

for  each  sensor  £  =  1, 2, . . . ,  L,  where  the  nt(i)  are  samples  from  a  zero  mean  Gaussian 
distribution  which  is  both  spatially  and  temporally  uncorrelated  with  E[nf(i)]  =  cr|, 
and  m  is  the  unknown  disorder  time.  Note  that  although  we  have  chosen  a  Gaussian 
1A  preliminary  version  of  this  work  has  appeared  in  [6]. 
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Figure  5.1:  Structure  of  the  distributed  system. 


noise  model,  the  analyses  which  appear  in  subsequent  sections  are  equally  applicable 
when  other  distributions  are  used. 

Each  local  detector  consists  of  a  summation  block  followed  by  a  comparator  as 
shown  in  Figure  5.1.  A  binary  decision  is  made  indicating  whether  or  not  the  sum  of 
M  successive  samples  exceeds  the  fixed  local  threshold  hi.  The  decisions  are  produced 
by  the  local  detectors  at  a  rate  of  one  every  M  samples  and  are  denoted  by  }ui{k)}, 
where 


ui(k) 

—  l{wi(k)  >  hi} 

Mk 

(5.1) 

wi(k) 

=  E  xt(i) 

(5.2) 

j=M(k- 1)+1 


for  i  =  1, 2, . . . ,  L,  where  T{A}  is  the  indicator  of  the  event  A.  Specifically,  sample 
xi(i )  is  involved  in  decision  ui(k)  if  and  only  if  k  —  j~^] ,  where  [x]  denotes  the  small¬ 
est  integer  greater  than  or  equal  to  x.  Finally,  global  decisions  {d(k)}  are  produced 
at  the  fusion  center  based  upon  past  and  present  local  decisions. 
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The  above  local  detection  procedure  is  illustrated  in  Figure  5.2  for  a  blocksize  of 
M  =  8.  Here  k0  denotes  the  block  in  which  the  disorder  (at  time  sample  m)  occurs. 
The  joint  distributions  of  the  samples  in  blocks  1, 2, . . . ,  &0  —  1  are  independent  and 

identically  distributed,  as  are  those  of  blocks  ko  +  1,  ko  +  2, _  Since  the  sample 

statistics  are  known  before  and  after  the  disorder,  the  distributions  of  every  block 
except  k0  are  also  known.  In  block  k0,  the  samples  may  have  any  one  of  M  joint 
distributions  due  to  the  M  possible  disorder  times  within  the  block.  2 


7716(171 


blocks  1, . . ko  —  1 


-• - #- 

1  2 


-• - *- 


block  ho 


fi  =  5 


- • - • — 

.  .  m-2  m-1  m  m+1  m+2  .  .  . 


M  =  8 


blocks  ko  +  1,  Aq  +  2, . 


Figure  5.2:  Operation  of  the  local  detectors. 


As  explained  in  Chapter  2,  the  goal  of  the  overall  procedure  is  to  minimize  the 
worst  expected  time  to  detect  the  disorder,  D ,  subject  to  a  lower  bound  on  the  mean 
time  between  false  alarms,  T.  Along  these  lines,  several  options  for  the  fusion  rule  will 
be  considered,  and  their  relative  performance  will  be  determined  by  computing  plots 
of  D  versus  log  T  for  each.  We  will  again  be  interested  in  the  asymptotic  performance 
measure 


V  = 


lim 

T-.00 


log T 
D 


It  will  be  shown  in  Section  5  that  a  lower  bound  rj  <  rj  can  also  be  obtained  for 

2A  slightly  modified  version  of  the  above  detection  scheme  can  also  be  used  in  continuous  time 
applications,  such  as  in  the  case  where  multiple  sensors  are  used  to  measure  radar  returns.  The 
approach  for  the  continuous-time  problem  is  essentially  the  same,  and  is  outlined  in  Appendix  A. 
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the  distributed  detection  problem  and  that  this  bound  can  be  used  to  compute  the 
thresholds  of  the  local  tests. 

In  analyzing  the  various  distributed  detection  procedures,  it  will  be  convenient  to 
utilize  two  versions  of  the  usual  stopping  variable:  3 

•  N  is  the  stopping  time  expressed  in  samples 

•  N  is  the  stopping  time  expressed  in  blocks 

Since  global  decisions  are  produced  only  after  each  block  of  M  snapshots  is  processed, 
it  is  natural  to  express  T  and  D  as  the  “expected  numbers  of  blocks”  before  an  alarm. 
However,  since  a  disorder  can  occur  at  any  one  of  M  time  instants  within  block  ko,  we 
would  ultimately  like  T  and  D  to  be  in  terms  of  samples  (i.e.,  number  of  snapshots). 
This  also  allows  us  to  compare  procedures  whose  blocksizes  differ. 

Recall  in  Chapter  2,  Section  4,  that 

T  =  No{60)  and  D  =  M0{6X) 

where  Nz{0)  was  the  ASN  of  Page’s  procedure  with  initial  score  z.  Here  we  introduce 
a  new  definition,  the  average  block  number  (ABN),  which  describes  the  stopping  time 
of  Page’s  test  in  terms  of  the  number  of  blocks  rather  than  number  of  samples.  The 
two  ABN’s  of  interest  are: 

EoN  =  expected  number  of  blocks  before  stopping  when  all  M  samples 

in  every  block  are  generated  under  Ho 
=  expected  number  of  blocks  before  stopping  where: 

i. )  in  the  first  block,  M  —  p  samples  are  generated  under  H0 
and  p  are  generated  under  H1 

ii . )  in  all  subsequent  blocks,  all  of  the  M  samples  are  generated 
under  Hi 

throughout  this  chapter,  a  tilde  indicates  units  of  blocks. 


Ei  N  |  p 
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where  the  initial  score  is  zero  in  both  cases.  In  Appendix  B,  the  following  relationships 
between  the  ASN’s  and  ABN’s  are  derived: 


Afo(6o)  =  M  ■  E0N 


p  +  (Ei  N  |  p 


(5.3) 

(5.4) 


where  again  p  is  the  number  of  samples  in  block  k0  taken  from  Hi.  Here  we  will  let 
p  be  fixed,  but  later  we  will  take  into  account  the  fact  that  p  is  actually  random.  To 
compute  the  performance  of  the  various  procedures,  the  ABN’s  will  first  be  computed, 
and  then  converted  to  units  of  samples  via  (5.3)-(5.4). 


5.3  Derivation  of  Fusion  Rules 

In  this  section,  we  introduce  four  procedures.  First,  the  optimal  ML  procedure  at  the 
fusion  center  is  derived,  and  it  is  shown  that  this  test  admits  a  recursive  implementa¬ 
tion.  Second,  a  version  of  Page’s  test  is  considered;  this  procedure  is  more  practical 
because  it  eliminates  the  need  for  performing  an  explicit  maximization  at  every  stage 
of  the  test.  Third,  a  procedure  that  is  suitable  for  the  important  case  where  si, . . . ,  sl 
are  unknown  is  presented;  this  procedure  is  sometimes  referred  to  as  Hinkley’s  test 
[1].  Finally,  the  ML  optimal  test  for  the  case  where  all  of  the  sensor  data  is  available 
to  the  central  processor  is  derived;  the  performance  of  this  procedure  is  used  as  a 
standard  to  which  the  distributed  procedures  can  be  compared. 

5.3.1  Known  Signal  Case 

Let  u (k)  =  {ui(k)}i=1  6  {0, 1}L  denote  the  local  decisions  for  block  k ,  and  let  /( u  |  p) 
denote  the  distribution  of  u  given  p  £  {0, . . . ,  M},  where  the  first  M  -  p  samples  of 
the  block  are  from  H0  and  the  last  p  are  from  H\.  Thus,  the  decisions  are  distributed 
as  /( u  |p  =  0)  for  blocks  1, . . . ,  k0  —  1  and  as  /( u  \p  =  M)  for  blocks  ko  +  1,  k0  +  2, . . ., 
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while  at  block  k0  the  distribution  is  /( u  |  p  =  p),  for  some  p  €  {1, . . . ,  M}.  Each  local 
decision  is  computed  according  to  (5.1)-(5.2).  Since  the  observables  are  Gaussian,  so 
are  the  sums  wi(k).  Specifically,  if  the  joint  distribution  for  block  k  is  /( u  |  p ),  then 
wt(k)  ~  J\f(pst,Mtf),£  =  1, ...  ,L,  and  Pr {ut{k)  =  1}  =  6t(p),  where 

e,(p)  =  (5.5) 

is  the  power  of  the  local  fixed  sample  test  when  the  disorder  occurs  p  samples  before 
the  end  of  the  block,  and  $(•)  is  the  cumulative  distribution  function  for  the  standard 
normal  distribution.  4 

The  distributed  quickest  detection  problem  may  be  alternatively  stated  in  terms 
of  blocks  rather  than  samples  as  follows.  Define  the  hypotheses  K^K^,  and  K\  as: 


K0  :  Pr{‘u^(fc)  =  1}  =  on,  k  =  1, . . . ,  kQ  -  1 

K'^  :  Pr{u*(A:)  =  1}  =  9t(p),  k  =  k0,  p  €  {1,2, (5.6) 

Ki  :  Pr-fu^fc)  =  1}  =  k  =  ko  +  1,  ko  +  2, . . . 


where  for  convenience  we  have  defined  Q^=^(0)  and  /3i=9i(M).  Thus,  when  a  disorder 
occurs  the  progression  K0  — >  K'^  — >  K\  results.  5 

Let  Cn(p,  ko)  denote  the  likelihood  ratio  of  the  local  decisions  up  to  and  including 
block  n  assuming  that  the  disorder  occurs  in  block  k0,  where  1  <  k0  <  n.  Specifically: 

nfc1  /(n  w  i  p = o)  •  /(u(m  i  P= •  rau+i  /(<■(*)  \p=m) 

IK=i/M*)|p=o) 


Cn(P}  ko) 


f(u(k0)\p  =  p) 


n 


f(u(k)\p=M) 


1  <  <n 


(5.7) 


f(u(ko)\p  =  0)  kJ£+1  f{u(k)\p  =  0)’ 

The  ML  procedure  is  to  maximize  (5.7)  over  the  quantities  fi  and  k0 ;  a  disorder  is 


declared  if  this  quantity  exceeds  a  fixed  threshold.  This  is  also  called  the  generalized 

4If  the  observations  are  not  Gaussian,  then  6i(p)  can  be  redefined  accordingly. 

5Observe  that  hypotheses  and  are  dependent  on  the  hypotheses  Ho  and  Hi.  In 

particular,  Ko  holds  if  and  only  if  every  sample  within  the  block  is  from  Ho ,  and  similarly  for  K\ 
and  JTi,  while  K ’*  holding  implies  that  samples  potentially  come  from  both  Ho  and  H±.  Notice  also 
that  =  Ki  when  /x  =  M. 
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likelihood  ratio  test  (GLRT)  [2].  In  general,  this  test  does  not  admit  a  recursive  solu¬ 
tion  which  would  make  the  procedure  more  useful  in  real-time  applications.  However, 
a  recursive  implementation  does  exist  for  the  present  problem,  as  shown  below. 

Since  the  ui(k)  are  simply  Bernoulli  random  variables,  we  have 

/(“i/>)  =  nwoMi-w)1-'  (s-8) 

i=i 

Substituting  (5.8)  into  (5.7)  and  taking  the  log  of  both  sides,  we  have 


L(n,ko)  =  log  £„(//,  ko) 

'Pi  ■'  ‘ 


=  !°g  n  . 

i=i  \  ai 


+  E  log  n 

*=**+!  l—l 


1  —  a* 


=  E  {“/(*.)  log  (^)  +  (1  -  “/(*o))  leg  (^f^) 


n  L 

+(n  -  k0)d  +  El  E  ciui(k) 

k=ko  +1  L—l 


where  c*=  log  and  d=  log  (|z^)  •  Define  the  test  statistic 


(5.9) 


Sn  =  max  max  ln(p,  k0) 

Kio<nl<)i<M  Vr’  ‘ 


The  GLRT  is  then  to  declare  a  disorder  at  block  n  in  case  Sn  >  h,  where  the  threshold 
h  of  the  test  is  chosen  to  satisfy  a  false  alarm  condition.  Since  p  appears  only  in  the 
first  term  of  (5.9),  we  define 

<^(u(fc))  =  1^“fE  jM*(*)los  (~~j  +(i  —  «/(*))  log  y~~)  }  (5-10) 

for  each  block  k,  so  that 


^(u(&0))  +  (n  -  k0)d  + 


E  Y,c*Mk) 


max 


(5.11) 
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In  order  to  obtain  the  recursive  version  of  this  test,  assume  that  Sn  has  been 
computed  prior  to  block  n  +  1.  Now: 


l<fco  <n+l 


n+1  L 


Sn+i  =  max  <  (j>(u(k0))  +  (n  +  1  —  k0)d  +  ^  ^  ciui(k) 


k—ko  +1 1=1 


{n+1  L 

<£(u(n+l)),  max  ^(u(A:0))  +  (n  +  1  —  k0)d  +  CjUi(k) 

l<^<n  k=k0+n=i 


=  max<  <^(u(ra  +  1)),  max  ^(u(/s0))  +  (n  —  &0)d  +  Y  ciUjjk) 


l<k0  <71 


k=ko  +1  /=1 


+d  +  Y  ciui(n  +  1)  } 

i=i  ) 


(5.12) 


where  in  the  last  line  the  maximization  has  been  computed  separately  over  the  sets 
{1  <fc0<n}  and  {k0  —  n  +  1},  and  with  the  convention  that  Yii=3  =  0  when  j  >  k. 
Finally,  (5.11)  and  (5.12)  together  yield  the  recursive  form  of  the  test: 

f  Sn  =  max{5n_i  +  gi(u(n)),<j>(u(n))} ,  S0  =  0 


where 


Ni  =  inf{n  |  Sn  >  h} 


5i(u(n))=  Yj  ciut(n )  +  d 


Here,  N\  is  the  stopping  time  of  the  test  and  g\  is  the  log-likelihood  ratio  for  testing 
Kq  versus  K\.  Note  that  the  stopping  time  is  expressed  in  blocks;  the  conversion 
to  samples  is  done  using  (5.3)-(5.4).  A  block  diagram  of  this  procedure  is  shown  in 
Figure  5.3. 

One  drawback  of  the  optimal  test  is  the  need  to  compute  <^(u(/c))  for  each  block. 
Since  this  requires  a  maximization  over  a  potentially  large  number  of  points,  we  are 
motivated  to  consider  the  following  procedure: 


Sn  =  max{5„_i  +  £2(11(71)),  0},  5o  =  0 

< 

N2  —  inf{n  |  Sn  >  h} 
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Figure  5.3:  Structure  of  the  ML  optimal  procedure. 


Figure  5.4:  Structure  of  the  suboptimal  procedure. 


where  <72  =  Pi  and  <^(-)  is  replaced  by  zero.  The  structure  of  this  test  is  shown  in 
Figure  5.4.  This  is  just  the  familiar  Page  procedure  of  Chapter  2.  Unlike  procedure 
Vi,  V2  does  not  explicitly  incorporate  the  information  for  the  change  block  into  the 
global  decision.  Note  that  V\  can  be  essentially  viewed  as  a  Page  procedure  with  a 
lower  boundary  which  is  dependent  on  the  data  (the  lower  boundary  is  zero  for  Vi)- 
Another  interesting  point  is  that  if  it  were  known  a  priori  that  the  disorder  occurred 
at  the  beginning  of  a  block,  i.e.  p  =  M,  V2  would  be  the  optimal  test  not  only  in 
the  ML  sense,  but  also  in  the  sense  of  minimizing  D  for  any  fixed  T,  the  criterion  of 
Lorden  which  was  discussed  in  Chapter  2. 

5.3.2  Unknown  Signal  Case 

In  the  above  procedures,  the  jump  magnitudes  of  the  signals  at  the  disorder  time  are 
taken  to  be  known.  However,  in  some  situations,  such  as  when  the  location  of  the 
phenomenon  causing  the  disorder  is  not  known,  the  resulting  signal  strengths  will 
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also  not  be  known.  This  motivates  us  to  consider  an  additional  version  of  Page’s 
test  suitable  for  problems  where  the  jump  magnitude  is  not  known.  The  approach  is 
similar  to  the  case  where  the  jumps  { si }i-i  are  known,  except  that  now  we  assume 
that  si  >  St  for  l  =  1, . . . ,  L,  where  {Si}\ Lx  is  a  set  of  minimum  jump  magnitudes. 
The  derivations  in  Section  5.3.1  are  then  carried  out  assuming  si  —  Si.  Thus,  this 
test  is  designed  for  the  disorder  of  minimum  magnitude,  although  the  monotonicity 
of  the  likelihood  ratio  means  that  the  procedure  will  also  react  to  larger  jumps. 

First,  we  assume  without  loss  of  generality  that  Si  =  S.  This  implies  &t(p)  =  0(/i), 
at  —  a,  and  /3/  =  /3.  Not  only  does  this  simplify  the  calculations,  but  it  may  also  be  a 
realistic  assumption  when  little  is  known  about  the  origin  of  the  disturbance.  Define 
v(k)  —  ]0i=1  v-i(k).  The  log-likelihood  ratio  in  (5.9)  now  reduces  to 

L(p,  h0)  =  v(ko)  log  j  +(L~  v(k0))  log 

+(n  —  k0)d  4-  c  ^2  v(k)  (5.13) 

k~ko  +1 


where  c=log  and  d=L  log  (yzf)-  Following  the  same  procedure  as  in  (5.12) 

and  again  neglecting  </>(•),  we  eventually  arrive  at  a  sequential  procedure  similar  to 
V2  for  detecting  changes  of  unknown  signal  strength: 


V3: 


max  {5n_i  +  g3(v(n)),  0} ,  S0  =  0 


(  N3  =  inf{n  |  Sn  >  h} 

where  g3(v(n))  —  cv(n)  +  d  is  the  log-likelihood  ratio  for  testing  K\  versus  Ko,  where 
now  K\  denotes  the  minimum  jump  hypothesis  (i.e.,  si  =  S,  V£).  We  refer  to  this  as 
Hinkley’s  test.  The  structure  of  V3  is  similar  to  that  of  (Figure  5.4),  except  that 
now  the  vector  of  local  decisions  u  is  replaced  by  the  sum  v.  Notice  that  for  this  test, 
the  fact  that  the  minimum  jump  is  S  for  each  sensor  means  that  all  measurements 
are  equally  weighted.  Therefore,  we  are  able  to  simply  consider  the  sum  of  the  sensor 
measurements  for  each  snapshot. 
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5.3.3  Non-Distributed  Case 


In  this  case,  all  of  the  data  is  available  at  the  fusion  center  (i.e.,  no  local  decisions 
are  made).  The  ML  optimal  test,  derived  in  Appendix  C,  is: 


max{5„_i  +  zn,  0},  So  =  0 
inf{n  |  Sn  >  h} 


where 


} 


This  is  the  optimal  procedure  in  terms  of  the  criterion  of  Lorden  (cf.  Chapter  2). 
To  see  this,  note  that  this  procedure  tests  for  a  change  in  the  mean  of  the  univariate 
random  variable  Zj  from  E(zj  \  H0)  to  E(zj  |  Hi),  where 


E(zj\H,)=-E(zj\H0)  =  \yd. 

1  1=1  °L 

It  will  be  shown  that,  while  this  test  requires  the  largest  channel  bandwidth,  it  also 
yields  the  best  performance  because  the  information  is  not  reduced  locally.  It  is 
therefore  included  as  a  benchmark  to  which  the  other  procedures  are  compared. 


5.4  Performance  Computation 

For  each  of  procedures  Vq  -  Vz,  performance  curves  are  generated  by  computing  the 
pair  (T,  D)  over  a  range  of  uniformly  spaced  values  of  h.  For  the  non-distributed 
procedure  (Vo),  the  Markov  approximation  method  described  in  Chapter  2,  Section 
4  is  used.  A  modified  version  of  this  method  can  also  be  used  for  the  distributed 
procedures,  as  described  below. 

Recall  from  Chapter  2  that  Ti(n)  was  defined  as  the  probability  of  reaching  stage 
n  under  hypothesis  Hi,  and  that  the  ASN  could  be  expressed  in  terms  of 
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For  the  distributed  procedures  it  is  useful  to  redefine  r^(n )  in  terms  of  blocks;  that  is, 
a  “stage”  in  this  case  is  just  a  block.  Specifically,  r0(n)  is  the  probability  of  reaching 
block  n  when  the  disorder  never  occurs,  and  7^(71;  p)  is  the  probability  when  the 
disorder  occurs  at  sample  time  /z  in  the  first  block. 

The  Markov  approximation  technique  can  be  used  as  in  Chapter  2  with  one  mod¬ 
ification:  the  statistics  of  block  k0  (the  change  block)  differ  from  those  subsequent 
blocks.  Therefore,  it  is  necessary  to  compute  separate  probability  transition  matrices 
for  each  of  the  three  hypotheses  KoyK'^,  and  Ki]  denote  these  as  Q0,QC,  and  Q1} 
respectively,  along  with  submatrices  Ro,  Rc,  and  Ri.  6  At  each  stage,  there  are  2L 
possible  input  vectors  u.  Thus,  the  Q’s  can  be  determined  in  the  following  way.  Define 
the  states  a0, ...  ,ap  and  a*  as  in  Chapter  2,  and  let  bin(j)  denote  the  binary  version  of 
integer  j:  for  example,  for  L  =  4  sensors,  bin(13)  =  [110  1]T.  Let  <p(-,  •)  be  a  generic 
mapping  for  incrementing  the  test  statistic;  namely,  if  £0,  (1  €  {«o,  •  •  • ,  «P,  cr*},  then 
=  93(^0,  u)  indicates  that  input  u  produces  the  state  transition  £0  •— >  Ci-  Now  Q 
can  be  determined  using  the  following  procedure: 

1.  Set  Qij  =  0,  Vi,j 

2.  For  i  =  0, 1, . . .  ,p  and  j  —  0, 1, . . . ,  2L  —  1: 

(a)  u  =  bin(j'),  pu  -  Pr{u} 

(b)  k  =  £,  where  i  is  such  that  at  =  ip(ai,  u),  with  the  convention  that  k  =  p  +  1  if 

a*  =  ipfe,  u) 

(c)  Set  Qirk  *  Qi,k  "b  Pu 

3.  Set  Qp^-iiP+i  =  1 

6The  subscript  “c”  stands  for  “change”. 
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Step  3  reflects  the  fact  that  state  p  +  1  is  a  terminal  state. 

The  ABN’s  are  determined  in  the  same  manner  as  are  the  ASN’s  in  Chapter  2. 
Here,  the  expected  stopping  times  are 

oo  oo 

Eo N  =  Y  n(n)(rc)  -  rQ(n  +  1))  =  Y  ^°(n)  (5-14) 

71—  1  71=1 

and  similarly 

OO 

^i[N\n]  =  Yrifati  (5-15) 

71=1 

For  the  false  alarm  case,  we  have 

f0(n;  ft)  =  7ToRo-11,  n  =  1,2,... 

which,  when  substituted  into  (5.14),  results  in 

EoAr=7r'(I-Ro)"1l  (5.16) 

Notice  that  (5.16)  is  identical  to  (2.14)  in  Chapter  2.  7 

The  computation  of  Ei  |  /zj  is  done  in  a  similar  manner,  except  that  we  must 
now  also  take  the  contribution  of  the  change  block  into  account.  The  transition 
matrix  for  this  case  is  Qc  for  the  first  stage  and  Qi  for  subsequent  stages.  Thus 


tl(1;  aO  =  1 

n(2;/i)  =  7ToRc1 

Fi(3;/z)  =  tt'RcRx!  (5-17) 


ri(n;/z)  =  7ToRcR"  21 
Therefore,  (5.15)  together  with  (5.17)  yields 

Ex  [A  |  p]  =  l  +  f)7r'RcRr2l 

71=2 

7 Recall  that  7Tq  is  a  truncated  version  of  the  state  probability  vector. 
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=  i+*;r,  (fxji 

=  1  +  7ToRc  (I  —  Ri)_1 1  (5.18) 

Finally,  the  quantities  in  (5.16)  and  (5.18)  are  converted  into  units  of  samples  via 
(5.3)-(5.4),  which  results  in: 

Af0(80)  =  Mtt'(  I-Ro)-1!  (5.19) 

No(Qi)  =  /i  +  Mtt'qRc  (I  —  Ri)-1 1  (5.20) 

A  different  Rc  is  computed  for  each  p  =  1,2 However,  observe  that  the 
product  7TqRc  simply  picks  off  the  first  row  of  Rc;  therefore,  for  each  p,  it  is  necessary 
only  to  compute  the  transition  probabilities  out  of  the  initial  state  (a0). 


5.5  Choosing  the  Local  Thresholds 

The  asymptotic  performance  measure,  77,  was  defined  in  Chapter  2.  It  was  seen  that 
1  is  the  slope  of  the  plot  of  D  versus  logT  as  T  — >  00,  and  therefore  minimizing 
D  corresponds  (asymptotically)  to  maximizing  77.  It  was  also  shown  that  the  lower 
bound  77  <  77  is  useful  because,  for  large  T,  it  enables  us  to  upper  bound  the  worst 
expected  delay  as 

and  also  it  is  not  difficult  to  compute.  The  original  derivation  of  77  appears  in  [4]. 

77  and  77  can  be  defined  in  the  same  manner  for  the  distributed  detection  case  with 
the  caveat  that  the  lower  bound  77  differs  slightly  for  this  problem.  It  turns  out  that 
for  the  ML  optimal  procedure,  the  lower  bound  is 

V  =  if  w°Ei[^(u)]  (5-21) 

where  uj0  satisfies  the  moment  generating  function  equality  E0  [exp{oJ0y !  ( u ) }]  =  1. 
The  derivation  of  this  bound  is  somewhat  messy,  and  can  be  found  in  Appendix  D. 


log  T 
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It  is  also  shown  in  this  appendix  that  rj  is  the  same  for  both  V\  and  "P2-  Thus, 
the  optimal  and  suboptimal  procedures  are  asymptotically  equivalent.  This  means 
that  for  large  values  of  T,  there  will  be  little  difference  in  performance  between  the 
two  procedures.  The  performance  calculations  in  the  next  section  corroborate  this. 
Below,  it  is  shown  that  rj  can  be  useful  in  determining  the  local  thresholds  so  as  to 
optimize  the  asymptotic  performance  of  the  overall  procedure. 

Let  h ioc  =  [hi,  hi, . . . ,  hi]  denote  the  vector  of  fixed  local  thresholds.  Ideally, 
one  would  like  to  optimize  the  distributed  system  over  all  possible  h{oc  €  5R+.  In 
Chapter  2,  we  stated  the  objective  of  quickest  detection:  to  minimize  D  subject  to  a 
lower  bound  on  T.  Using  (5.19)  and  (5.20),  this  optimization  problem  can  be  written 
explicitly  as: 

minimize  p  +  Mrc'0R.c  (I  —  Ri)-1 1  over  h;oc  6 
subject  to  Mn'0  (I  —  Ro)-1  X  >  c 

where  c  is  a  positive  constant.  This  approach  is  difficult  to  implement  for  several 
reasons.  The  first  reason  is  the  existence  of  the  change  block.  Here,  the  probability 
0t(p)  of  locally  detecting  the  change  at  sensor  i  is  dependent  on  p.  Since  p  is  unknown, 
it  is  not  possible  to  select  an  optimum  threshold  a  priori.  8  Another  difficulty  is 
that  the  optimal  h/oc  is  dependent  on  the  desired  mean  time  between  false  alarms, 
or  similarly,  on  the  desired  global  threshold  h.  One  can  see  this  by  noting  that  a 
procedure  with  a  higher  h  will  produce  an  alarm  later  than  the  same  one  with  a  lower 
h ;  thus,  the  contribution  of  the  change  block  to  the  final  decision  is  more  significant  for 
lower  h ,  and  so  in  this  case  one  would  like  to  choose  the  local  thresholds  to  take  fuller 
advantage  of  the  information  extracted  from  this  block.  It  is  also  not  clear  whether  a 
unique  solution  to  this  nonlinear  optimization  problem  exists.  Since  T  is  a  function  of 
both  h;oc  and  h,  fixing  T  does  not  lead  to  a  fixed  h ,  so  h  is  also  dependent  on  h ioc.  One 

8  One  might  consider  assigning  a  uniform  distribution  to  the  arrival  time  within  the  change  block 
and  averaging  over  all  possible  p. 
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might  consider  an  adaptive  search  to  maximize  the  performance  over  all  h \oc  £ 
Unfortunately,  the  relatively  long  time  required  to  compute  the  performance  even  for 
fixed  thresholds  would  be  prohibitive  in  an  iterative  scheme. 

A  simpler  method  for  selecting  the  local  thresholds  based  on  the  asymptotic  lower 
bound  fj  in  (5.21)  is  proposed  here.  The  goal  is  to  maximize  rj  over  h/oc  €  When 
g  is  the  log- likelihood  ratio  for  choosing  between  Ko  and  K\ ,  direct  evaluation  of  the 
moment  generating  function  identity  [4]  yields  u0  =  1.  Since  M  is  a  constant,  the 
goal  is  then 

maxEi^u)]  =  maxEx  (w/log  (~ )  +  (1  ~  w*)l°g  [\ — }  (5.22) 

hIOC  h£oc  Lfal  {  \<H) 

Since  the  local  decisions  are  independent,  the  expectation  can  be  applied  termwise. 
Thus, 

maxEx[flf(u)]  -  max ]T Ei  { ut\og  f— )  +  (1  -u/)log 

hioc  hloc  l=1  l  \ccij 

L 

=  max^A  (&,<**) 

hioc  i- 1 
L 

=  V  max  a/) 

i=i  ht 

where  we  note  that  Eiju*]  =  /3*  and  Dj,(a,  b)  =  alog(|)  +  (1  —  a)log(^|)  is  just 
the  binary  discrimination  function  from  information  theory  [3].  Thus,  the  problem 
of  globally  maximizing  the  asymptotic  lower  bound  amounts  to  selecting  the  local 
thresholds  to  maximize  the  binary  discrimination  for  each  sensor. 

The  local  threshold  hi  is  related  to  ai  and  f}4 i  for  l  —  1, 2, . . .  ,  L  via  the  equations: 


at  =  r  exp  ( - -L-TA  dr  =  1  -  §  ( 

Jht  \ZlnMa  l  2 Ma2  )  \i/MoJ 

Jht  \J2kMo  \  2Mo2  K  P  '  /  V  VMa  J 

The  function  Df,(/3/(h*),  ai{hi))  has  a  unique  maximum  over  hi  £  [0,  oo],  so  the  op- 


timal  threshold  is  easy  to  compute. 


The  explicit  solution  satisfies  a  transcendental 
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equation;  therefore,  we  instead  use  a  binary  search  procedure  allowing  us  to  get  ar¬ 
bitrarily  close  to  the  optimal  threshold.  The  merits  of  using  the  above  scheme  for 
choosing  the  local  thresholds  will  be  evaluated  in  Section  7  via  a  sensitivity  analysis. 

As  a  sidenote,  it  is  not  difficult  to  show  that  regardless  of  the  choice  of  hfoc,  the 
optimal  non-distributed  procedure  is  always  asymptotically  better  than  its  distributed 
counterpart.  Let  rjdis  and  rjnon  denote  the  asymptotic  performances  for  the  distributed 
and  non-distributed  procedures,  respectively.  Because  the  log-likelihood  ratio  is  used 
for  each  case,  the  lower  bounds  are  tight.  It  is  shown  in  Appendix  C  that 


L  „2 

: — v  *5/ 


1=1  ZCTl 

Also,  from  the  above  discussion,  when  the  optimum  local  thresholds  are  used: 


rjdis  =  Vdis  =  77  X)  ma xDb(0i,  at) 

M  t=i  ht 


(5.23) 


The  equality  in  (5.23)  is  strict  since  the  log-likelihood  ratio  is  used  at  the  fusion  center 
[4].  Now  Corollary  4.4.2  in  [3]  provides  the  following  bound: 


(5.24) 


where  qlP  and  g,  are  the  densities  of  sample  xt(n)  under  hypotheses  Ho  and  Hi, 
respectively,  and 


/(li'U'’)  =  Jlog^Mqd>(x)dx  =  ^ 


is  the  Kullback-Leibler  divergence.  Combining  (5.23)-(5.25),  we  have 


(5.25) 
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5.6  Examples  of  the  Performance  Computations 

In  this  section,  we  illustrate  the  performance  of  procedures  V0-V3  for  a  distributed 
detection  system  with  L  —  4  sensors.  The  sample  noise  at  each  sensor  is  standard 
Gaussian.  Two  cases  will  be  considered: 


case 

post-disorder  signal  strength ,  dB 

1=1  1  =  2  1  =  3  1  =  4 

“strong  signal” 

“weak  signal” 

0  -3  -6  -10 

-20  -23  -26  -30 

For  the  distributed  procedures,  the  thresholds  for  the  local  detectors  are  obtained 
using  the  method  of  Section  4. 

5.6.1  Procedures  Where  the  Jump  Magnitudes  Are  Known 

In  Figure  5.5,  a  plot  of  D  vs.  T  for  the  ML  optimal  procedure  is  shown  for  the 
blocklength  M  =  20  and  several  values  of  p.  This  illustrates  the  effect  of  the  disorder 
arrival  time  on  the  expected  delay.  The  average  and  worst-case  performance  is  also 
shown.  To  compute  the  former,  the  average  expected  delay  over  all  p  for  each  T  is 
determined  (i.e.,  assign  a  uniform  prior  probability  to  p);  for  the  latter,  the  maximum 
expected  delay  over  p  is  determined.  We  observe  that  D  is  lower  when  the  disorder 
occurs  either  very  early  or  very  late  in  the  block.  Since  p  is  unknown,  only  the  average 
and  worst-case  performance  will  be  considered  from  here  on. 

The  accuracy  of  all  of  the  procedures  was  verified  by  performing  Monte  Carlo  sim¬ 
ulations.  9  Each  of  the  procedures  was  implemented  on  a  MasPar  massively  parallel 
computer,  enabling  us  to  perform  multiple  runs  simultaneously;  in  this  case,  each  of 
the  4096  processors  executed  a  single  run.  In  Figure  5.5,  the  circles  are  the  values 

9The  simulated  values  are  shown  explicitly  only  in  Figure  5.5.  These  values  are  not  shown  in  the 
other  graphs  so  that  the  detail  remains  clear  where  there  are  several  plots  on  one  graph. 
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generated  for  h  =  2, 4,  6, . . . ,  14  in  this  manner.  One  can  see  the  excellent  agreement 
between  the  simulated  and  computed  performance  throughout.  The  asterisks  in  Fig¬ 
ure  5.5  show  the  computed  values  for  h  =  16, 18,  and  20;  simulated  values  are  not 
shown  for  these  points  due  to  the  large  computing  time  required  to  generate  false 
alarms  in  this  range. 

The  average  and  worst-case  performance  of  the  ML  optimal,  Page,  and  non- 
distributed  procedures  are  compared  for  the  strong  signal  case  in  Figure  5.6.  The 
performance  of  the  non-distributed  procedure  is  uniformly  better  than  that  of  the 
other  procedures,  as  expected;  in  addition,  the  advantage  of  utilizing  all  of  the  sensor 
data  at  the  central  processor  increases  linearly  in  the  log  of  T.  It  is  interesting  to 
note  that  both  the  average  and  worst-case  performances  of  the  ML  optimal  and  Page 
procedures  are  virtually  identical,  with  the  ML  optimal  winning  out  only  by  a  small 
amount  in  some  places.  We  also  see  that  the  performance  degrades  with  increasing 
blocksize,  and  the  best  choice  is  M  =  1.  This  reflects  the  tradeoff  between  using 
large  enough  M  so  that  the  local  detectors  are  sufficiently  powerful  and  keeping  M 
small  so  that  the  detection  will  be  quick.  In  this  case,  the  disorder  has  a  large  enough 
magnitude  that  only  one  sample  is  required  for  the  local  detectors. 

The  same  computations  are  shown  for  the  weak  signal  case  in  Figure  5.7.  Again, 
the  non-distributed  procedure  is  the  best  of  the  three,  and  the  average  and  worst-case 
performance  for  the  ML  optimal  and  Page  procedures  are  nearly  the  same.  In  general, 
the  expected  delays  are  much  larger  for  the  small-signal  case;  this  is  also  expected, 
since  weaker  signals  will  lead  to  local  detectors  with  lower  power.  A  major  difference 
between  the  weak  and  strong  signal  cases  is  the  sensitivity  of  the  performance  with 
respect  to  blocksize.  In  Figure  5.6,  we  see  that  the  choice  of  M  =  10  yields  an 
expected  delay  that  differs  from  M  =  1  by  about  5  samples,  while  for  M  =  20,  this 
difference  is  closer  to  10  samples,  a  significant  percentage  of  the  delay  for  M  =  1. 
However,  for  the  case  shown  in  Figure  5.7,  the  performance  for  M  =  500  is  very 


Chapter  5:  Quickes 


Figure  5.6:  Average 
distributed  procedures 


Chapter  5:  Quickest  Detection  in  Decentralized  Decision  Systems 


158 


Figure  5.7:  Average  and  worst  case  performance  of  ML  optimal,  Page,  and  non-distributed 
procedures.  L  =  4,  weak  signal  case. 

close  to  that  for  M  =  1.  In  addition,  the  difference  in  expected  delay  for  these  two 
cases  diminishes  as  T  increases.  Therefore,  the  smaller  the  jump  magnitude,  the  less 
critical  the  choice  of  the  blocksize  is,  though  in  either  case  an  M  which  is  too  large 
will  result  in  significant  performance  degradation. 

5.6.2  Procedures  Where  the  Jump  Magnitudes  Are  Un¬ 
known 

We  now  compute  the  performance  of  Hinkley’s  test  and  compare  it  to  that  of  the 
ML  optimal  and  non-distributed  tests  (i.e.,  procedures  where  the  jump  magnitudes 
are  known).  The  jump  magnitudes  are  those  of  the  strong  signal  case  given  in  the 
previous  section.  For  Hinkley’s  test,  we  consider  two  magnitudes  for  the  minimum 
jump:  SNRjnin  =  —10  and  —  20  dB.  In  other  words,  we  determine  the  parameters 
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for  Page’s  tests  assuming  that  the  disorder  is  of  magnitude  SNRjn^. 

The  results  are  shown  in  Figure  5.8.  Here,  the  average  performance  is  computed 
(the  results  are  similar  when  the  worst-case  performance  is  used).  One  can  see  im¬ 
mediately  the  benefit  of  knowing  the  actual  signal  strength.  The  performance  of  the 
Hinkley  procedures  diverges  from  that  of  the  ML  optimal  procedure  as  T  increases, 
regardless  of  the  choice  of  M.  Also,  the  performance  for  SNRmin  =  — 10  dB  exceeds 
that  of  SNRmin  =  —20  dB,  more  so  with  larger  T.  This  reflects  the  fact  that  the 
better  an  idea  one  has  about  the  jump  magnitude,  the  better  the  performance  that 
can  be  achieved.  Although  not  included  here,  a  similar  comparison  was  done  for  the 
weak-signal  scenario,  and  the  results  were  analogous  to  the  above. 


Figure  5.8:  Average  performance  of  Hinkley  procedure  with  SNRmm  =  —10  and  —  20  dB, 
the  ML  optimal  procedure,  and  the  non-distributed  procedure.  L  =  4,  strong  signal  case. 
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5.7  Sensitivity  of  Performance  To  Variation  in 
the  Local  Thresholds 

In  Section  4.4,  a  procedure  for  determining  the  thresholds  of  the  local  detectors  based 
upon  the  lower  bound  of  the  asymptotic  performance  measure  rj  was  presented.  How¬ 
ever,  it  is  not  clear  as  to  whether  the  resulting  choice  of  thresholds  is  near  the  optimal 
for  two  reasons:  (1)  77  is  an  asymptotic  performance  measure,  so  the  performance  in 
general  may  be  inadequate,  and  (2)  the  tightness  of  the  lower  bound  was  not  consid¬ 
ered.  This  motivates  us  to  perform  the  following  sensitivity  analysis. 

Let  h ioc  denote  the  L-dimensional  vector  of  local  thresholds  obtained  via  the 
method  of  Section  4.4,  and  let  hper  denote  a  multiplicative  perturbation  of  h ioc  such 
that 

hper  =  C  ‘  h;oc,  C  £ 

Thus,  by  varying  the  perturbation  parameter  c  and  computing  the  performance  using 
hper  as  the  local  thresholds,  we  can  determine  whether  h/oc  (i.e.,  c  =  1)  is  a  good 
choice. 

In  Figure  5.9,  we  compute  the  average  ML  optimal  performance  for  iV  =  10  and 
values  of  c  ranging  from  0.5  to  1.5.  We  see  that  asymptotically  as  T  increases,  c  =  1 
is  the  best  choice.  We  also  note  that  a  change  of  plus  or  minus  ten  percent  does  not 
significantly  affect  the  performance.  From  this,  we  conclude  that  the  methodology  of 
Section  4.4  is  reasonable. 

In  Figure  5.10,  the  analysis  is  repeated  for  the  weak  signal  case  where  N  —  100. 
Again,  we  see  that  the  best  performance  occurs  when  c  =  1.  However,  there  is  little 
difference  in  performance  even  for  variations  in  h/oc  of  as  much  as  fifty  percent.  In 
other  words,  the  performance  is  far  less  sensitive  to  the  choice  of  local  thresholds 
when  the  jump  magnitudes  are  smaller. 
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Figure  5.9:  Sensitivity  of  average  performance  of  the  ML  optimal  procedure  to  perturba¬ 
tions  of  the  local  thresholds.  M  =  10,  strong  signal  case. 


Figure  5.10:  Sensitivity  of  average  performance  of  the  ML  optimal  procedure  to  perturba¬ 
tions  of  the  local  thresholds.  M  =  100,  weak  signal  case. 
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5.8  Block  Length  Effects 


In  this  section,  we  address  the  question:  For  a  particular  distributed  system,  what 
is  the  best  choice  for  the  blocksize  Ml  We  assume  that  the  number  of  sensors  L  is 
fixed  and  known,  and  that  the  designer  has  a  specified  minimum  desired  T . 

There  are  two  considerations  in  the  choice  of  M.  The  first  is  the  expected  delay 
for  the  chosen  value  of  T;  in  other  words,  the  expected  delay  can  be  parameter¬ 
ized  as  D{T,  M).  We  have  seen  in  the  previous  section  that  the  performance  varies 
significantly  depending  on  the  blocksize. 

The  second  issue  is  the  cost  C(M )  associated  with  M.  Since  the  sampling  rate 
fs  is  fixed,  the  smaller  the  value  of  M,  the  more  frequently  the  local  decisions  are 
sent  to  the  fusion  center,  and  thus  the  higher  the  bandwidth  required  for  each  of 
the  channels.  Each  local  decision  is  represented  by  a  single  bit,  and  so  the  cost  is 
proportional  to  the  bit  rate.  Thus: 

C(M)  =  (5-26) 


where  C0  is  a  constant  with  units  “cost  per  unit  bit  rate”  which  reflects  the  relative 
cost  of  increasing  the  channel  bandwidth.  For  example,  a  system  with  M  =  1  costs 
twice  as  much  as  a  system  with  M  =  2,  which  costs  twice  again  as  much  as  with 
M  =  4;  i.e.  the  cost  is  linearly  proportional  to  the  inverse  of  the  blocksize.  Note, 
however,  that  one  is  not  restricted  to  costs  of  the  form  (5.26);  on  the  contrary,  a 
C(M )  which  more  accurately  models  the  cost  structure  for  a  particular  system  may 
be  substituted.  Nonetheless,  (5.26)  is  used  in  this  case  without  loss  of  generality. 

The  best  choice  of  M  is  a  compromise  between  the  desire  to  quickly  detect  a 
disorder  and  the  need  to  minimize  the  system  cost.  To  this  end,  we  propose  the 
following  design  methodology.  For  procedure  Vi,  i  =  1,2,3,  define  the  function 


Vi{T\M)  = 


min 

T>T* 


Di(T,M) 

D0(T*) 


-  1 
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)  is  a  relative  performance  measure  which  reflects  the  sacrifice  in  perfor¬ 
mance  for  using  procedure  Vi  instead  of  the  non-distributed  procedure  for  the  spec¬ 
ified  minimum  allowable  mean  time  between  false  alarms  T*  and  block  size  M.  The 
plot  of  D  vs.  logT  for  the  non-distributed  case  is  continuous;  therefore,  any  operat¬ 
ing  point  T*  is  achievable  using  Vo ■  However,  the  fact  that  the  set  of  possible  local 
decisions  at  any  time  is  a  discrete  finite  set  (of  cardinality  2L)  means  that  a  designer 
may  not  be  able  to  design  a  test  operating  at  exactly  T*.  10  Therefore,  the  delay 
corresponding  to  the  smallest  T  >  T*  is  used. 

The  overall  desirability  of  different  block  lengths  is  determined  by  evaluating  both 
the  relative  performance  function  and  the  cost  function  over  a  range  of  M ,  and  plot¬ 
ting  7 versus  C(M)  for  each  procedure  Vi,i  =  1,2,3.  This  allows  a  system 
designer  to  decide  whether  an  incremental  cost  increase  will  produce  a  worthwhile 
improvement  in  performance. 

In  Figure  5.11,  the  above  methodology  is  illustrated  for  the  strong  signal  scenario 
considered  previously.  Here  )  is  plotted  versus  C{M )  for  i  —  1,2,  and  3, 

where  for  simplicity  we  take  Co  —  1  and  fs  =  1.  The  plots  are  generated  for  T*  =  106 
and  T*  =  lO10.  We  see  that  in  general,  the  relative  performance  deteriorates  as  the 
blocklength  M  increases;  that  is,  as  the  system  cost  decreases.  For  the  ML  optimal 
procedure,  the  best  performance  is  achieved  for  M  =  1;  of  course,  this  is  also  the 
most  expensive  alternative.  However,  a  designer  might  be  willing  to  sacrifice  some 
performance  in  order  to  reduce  the  overall  cost.  In  such  cases,  the  plot  of  relative 
performance  versus  cost  makes  it  easier  to  select  a  compromise.  For  example,  one 
can  see  that  little  is  sacrificed  by  using  M  =  10  rather  than  M  =  1,  and  so  this  is 
an  attractive  alternative.  The  aforementioned  fact  that  the  ML  optimal  (i  =  1)  and 
Page  ( i  =  2)  procedures  exhibit  nearly  identical  performance  is  evident  here  as  well. 

In  Figure  5.12,  the  relative  performance  is  shown  for  the  weak  signal  case.  Again 


10One  might  consider  using  a  randomized  test  to  exactly  achieve  T* . 
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we  see  that  for  sufficiently  large  M,  the  performance  rapidly  decreases.  However, 
unlike  in  the  strong  signal  case,  the  best  choice  of  blocksize  does  not  correspond  to 
the  system  with  the  highest  cost.  In  fact,  the  best  choice  for  M  is  closer  to  100, 
and  significant  performance  degradation  doesn’t  occur  until  M  is  larger  than  500. 
Another  observation  is  that  at  M  =  100,  it  appears  that  the  performance  of  the  Page 
procedure  is  better  than  that  of  the  ML  optimal,  yet  this  is  not  so.  This  appearance 
is  a  consequence  of  the  fact  that  we  cannot  design  a  test  for  exactly  an  arbitrary  T, 
as  we  mentioned  earlier;  in  this  instance,  the  smallest  T  >T*  for  the  Page  procedure 
was  closer  to  T*  than  the  corresponding  T  for  the  ML  optimal  procedure,  which  led 
to  KX(T*,M)  >  ll2(T\M). 

One  question  arises  from  the  above  observations:  Why  is  it  better  to  use  larger 
blocksizes  when  detecting  small  signals,  even  though  the  goal  is  quickest  detection? 
The  optimal  procedure  for  detecting  one  mean  versus  another  in  Gaussian  noise  given 
M  observables  is  to  compare  the  sum  of  the  samples  to  a  threshold.  If  the  mean  is 
large,  it  will  require  relatively  few  samples  in  order  to  get  a  test  of  reasonable  power. 
For  the  strong  signal  case  we  considered,  M  —  1  was  sufficiently  large  to  enable 
local  tests  to  make  decisions  with  high  accuracy.  Although  a  higher  M  would  have 
produced  even  more  powerful  local  tests,  this  increase  is  more  than  offset  by  the 
additional  delay  in  detection  incurred.  For  weaker  signals,  M  must  be  larger  in  order 
to  produce  local  tests  with  sufficiently  high  power.  To  clarify  this  point,  suppose  that 
M  =  1  is  chosen.  The  overall  procedure  then  amounts  to  hard-limiting  the  sensor 
measurements  and  using  the  outcomes  in  a  sequential  procedure  at  the  fusion  center, 
a  well-known  non-parametric  procedure.  The  fact  that  M  —  1  is  the  best  choice  in 
Figure  5.11  shows  that  for  strong  signals,  it  is  not  necessary  to  process  the  local  data 
in  an  optimal  fashion  (i.e. ,  summing  the  samples)  in  order  to  get  good  performance. 
On  the  other  hand,  Figure  5.12  illustrates  that  for  weaker  signals,  it  is  necessary  to 
involve  many  samples  in  each  local  decision  in  order  to  achieve  reasonable  power. 
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This  tradeoff  has  been  explored  previously  in  the  context  of  quickest  detection  for 
scalar  (non-distributed)  signals  [12]. 

5.9  Conclusions 

In  this  chapter,  we  examined  the  quickest  distributed  detection  problem.  The  local 
detection  procedure  was  to  simply  compare  the  sum  of  successive  blocks  of  sensor  data 
to  a  threshold,  and  the  disorder  time  was  assumed  unknown.  Several  alternatives  for 
the  fusion  procedure  were  considered.  For  the  case  when  the  magnitudes  of  the  jumps 
are  known,  we  derived  the  ML  optimal  procedure  along  with  a  suboptimal  version 
which  requires  less  computation  to  implement.  When  the  magnitudes  are  unknown, 
a  similar  procedure  designed  to  react  to  a  nominal  jump  is  used.  Each  of  these 
procedures  were  shown  to  have  recursive  implementations. 

The  performance  of  the  above  procedures  was  computed  by  modelling  the  test 
statistics  as  a  Markov  process,  allowing  us  to  get  explicit  expressions  for  the  average 
sample  numbers  before  and  after  the  disorder.  Analytical  expressions  for  the  per¬ 
formance  are  important  since  they  allow  the  system  designer  access  to  alternatives 
without  extensive  Monte  Carlo  simulations;  in  this  case,  the  analytic  computations 
were  verified  via  simulation.  It  was  shown  that  the  performances  of  the  ML  optimal 
and  suboptimal  procedures  are  asymptotically  equivalent,  and  so  in  practice  the  latter 
test  might  be  the  better  alternative. 

A  simple  method  for  choosing  the  thresholds  of  the  local  detectors  based  upon 
a  lower  bound  on  the  asymptotic  performance  measure  was  introduced.  Sensitivity 
analysis  reveals  that  the  procedure  leads  to  near  optimal  performance.  This  eliminates 
the  necessity  of  solving  a  set  of  coupled  nonlinear  equations  to  obtain  the  optimal 
threshold  settings,  which  is  the  common  situation  in  decentralized  detection  problems. 

Finally,  a  methodology  was  developed  to  determine  the  best  choice  of  blocklength 
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for  the  local  tests.  This  involved  a  tradeoff  between  decision  delay  and  increased 
system  bandwidth  (i.e.,  communication  cost).  For  the  strong  signal  case,  the  results 
were  as  expected:  a  decrease  in  communication  cost  results  in  a  deterioration  in 
performance.  Surprisingly,  for  the  weak  signal  case,  there  is  a  range  where  lowering 
the  communication  cost  results  in  an  improvement  in  performance 

There  are  several  interesting  directions  for  future  work.  First,  although  we  used 
simple  summing  devices  at  the  local  detectors,  the  analysis  could  be  easily  extended 
to  include  other  types  of  detection  schemes.  For  example,  if  the  signal  were  time 
varying,  two  options  might  be  viable:  for  coherent  detection,  an  estimator-correlator 
could  be  used,  while  for  noncoherent  detection,  a  generalized  energy  detector  might 
be  appropriate.  Another  issue  is  the  assumption  of  the  independence  of  the  samples. 
If  the  samples  are  correlated,  then  so  will  be  the  local  decisions.  For  this  case,  one 
approach  would  be  to  design  Page’s  test  using  the  conditional  densities  (conditioned 
on  the  past  decisions)  rather  than  the  marginals;  however,  if  the  blocksize  is  large, 
the  decisions  may  only  be  slightly  correlated,  and  so  such  a  modification  might  not 
be  necessary.  Finally,  it  would  be  useful  to  develop  a  procedure  for  estimating  the 
location  of  a  disorder  using  a  distributed  detection  scheme.  This  could  be  useful,  for 
instance,  in  the  case  where  distributed  sensors  monitor  seismic  activity,  and  upon 
detecting  an  earthquake,  one  wishes  to  locate  the  epicenter. 
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5.10  Appendices 

5. 10. A  Extension  to  the  Continuous  Time  Case 

Here  we  present  the  continuous  time  analogue  to  the  distributed  detection  problem. 
The  observables  at  each  sensor  are  typically  modelled  using  a  stochastic  differential 
equation  [4,  19]: 


dxi(t )  =  si  ■  u(t  —  to)dt  +  •  dwi(t),  i  —  1,2,...,L 


where  wi(t)  is  a  Wiener  process  at  sensor  t  scaled  by  u(-)  is  the  unit  step  function, 
to  is  the  disorder  time,  and  si  is  the  drift  at  sensor  i  resulting  from  the  disorder. 

In  the  discrete  time  case,  the  local  decisions  are  produced  by  comparing  sums  of 
N  successive  samples  to  a  threshold.  In  the  present  case,  the  summation  is  replaced 
by  the  integral  of  the  sensor  measurements  over  a  time  window  of  length  NT3.  In 
particular,  ut(k)  (the  A:th  decision  at  sensor  i)  is 


ut{k )  =  < 


1, 

0, 


wi(k )  >  hi 

otherwise 


where 

rkNT , 

wAk )  =  /  xi{t)dt 

Let  t  -  jr&r  and  let  k0  and  u  denote  the  change  block  and  the  disorder  time 
within  block  k0  as  before.  In  the  continuous  case,  the  disorder  may  occur  at  any  time, 
not  just  at  discrete  sample  instants;  thus  p  =  koNTa  —  t0  E  [(&o  —  1)NTS,  koNTa].  Now 
the  distribution  of  the  local  decisions  is  exactly  that  of  (5.5)-(5.6),  with  the  exception 
that  p  now  takes  on  continuous  rather  than  discrete  values.  This  means  that  all  of  the 
techniques  developed  for  the  discrete  time  case  can  also  be  implemented  in  continuous 


time. 
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5.10.B  Conversion  From  Blocks  To  Samples 

Let  N  denote  the  stopping  time  in  terms  of  blocks,  respectively,  where  a  block  consists 
of  M  samples.  Let  pi(k)  denote  the  probability  that  the  alarm  occurs  at  block  k  under 
Hi,  and  let  p  denote  the  number  of  samples  taken  from  H\  present  in  the  change 
block  k0.  11  Since  global  decisions  are  only  made  at  the  ends  of  a  block  (i.e.-  every 
M  samples),  under  H0  we  have 

■A/"o(0o)  =  Mp0(l)  +  2Mp0(2)  +  3Mp0(3)  +  . .  • 

=  M  ^2  kpo(k) 

k=  1 

=  M-E0N 

Under  H\,  detecting  the  delay  at  block  k0  corresponds  to  a  delay  of  p  samples.  For 
each  additional  block  before  the  alarm,  M  additional  samples  are  required.  Therefore: 

•A/o(0i)  =  PPi(i)  +  (p  +  M)pi(2)  +  (p  +  2M)pi(3)  +  . . . 

OO 

=  +  M(k-l)]p  i(fc) 

fc=l 

=  Ei  [p  +  M(N  —  1)  |  /*] 

=  p  +  m(e1[n\p\-i) 

where  the  last  equality  results  from  the  linearity  of  the  conditional  expectation. 


11  Notice  that  pi(k)  is  a  function  of  of  /z. 
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5.10.C  Derivation  of  the  Optimal  Non-Distributed  Proce¬ 
dure 

Here  we  derive  the  optimal  test  when  all  of  the  data  is  available  at  the  central  pro¬ 
cessor.  For  convenience,  we  adopt  the  following  vector  notation: 

S  —  [sj,  52;  •  •  ■  j  5£,] 

7 

n(i)  =  [ni (*)>  n2(0>  •  •  • ,  nL{i)\ 

f 

n(i),  i  —  1, 2, . . .  ,m  —  1 

XM  =  \  ,.x 

nm  +  s,  z  =  m,m  +  1,... 

where  m  is  the  change  time,  and  x(i)  is  a  snapshot  of  the  sensor  data  at  sample  time 
i.  Let  the  distribution  of  n(z)  be 

Kn(*))  =  (jj-)  |Sr*exp  (-|nr(i)S_1n(*))  ,  Vi  (C.l) 

where  E  =  diag(of ,  erf, . . . ,  cr£).  The  log-likelihood  ratio  for  a  disorder  occurring  at 
sample  m  within  a  block  of  n  samples  is 

, /r /.m»  \  ,  n^K^n^KxOO-s) 

4({x(.)}i=lim)  =  log - nS7wi)j - 


=  Y  lo§ 


Kx(i)) 


(C.2) 


Substituting  (C.l)  into  (C.2),  we  have 


4({x(i)}r=1;m)  =  Y  {sTs  *x(0  -  lsTZ  Xs} 

J— 771 

=  Y  Y  aT2  { xiU  )*t  -  Ld 

j=m  1=1  V  Z  J 


(C.3) 


where  the  definition  of  Zj  is  obvious.  The  likelihood  ratio  is  now  maximized  over  all 
possible  change  times,  resulting  in  the  test: 


Sn  =  max  V  Zj 

1  <m<n  4^  3 

-  “  J—m 


(C.4) 
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N  =  inf{n  |  Sn  >  h}  (C.5) 

This  is  simply  the  univariate  Page’s  test,  and  it  is  shown  in  [11]  that  the  recursive 
test 

Sn  =  max{5„_i  +  zn,  0} 

N  =  inf{n  |  Sn  >  h} 

is  equivalent  to  (C.4)-(C.5). 

The  asymptotic  performance  measure  can  now  be  computed.  Since  the  log- 
likelihood  ratio  is  used  to  process  the  data,  the  lower  bound  rj  <  r}  is  tight  [4]. 
Therefore, 

v=fj  =  E1zj  =  J24l2 

l=i  l(Jl 

Therefore,  the  asymptotic  performance  is  proportional  to  the  collective  signal  to  noise 
ratios  at  each  of  the  sensors. 


5.10.D  Computation  of  fj  for  the  Distributed  Procedures 


Here,  it  is  shown  that  rj  is  a  valid  lower  bound  on  the  asymptotic  performance  measure 
rj  for  the  distributed  detection  case.  Throughout,  N  and  N  denote  stopping  times  in 
terms  of  blocks  and  samples,  respectively. 

Consider  the  following  hypothesis  testing  problem.  Let  u(n)  =  ['Ui(n), . . . ,  ui(n)]  € 
{0,  \}L  denote  the  snapshot  of  local  decisions  for  block  n  such  that 


Pr{u/(n)  =  1}  =  < 


1=  1,2, ... 


(D.l) 


cti,  under  hypothesis  Kq 
fit,  under  hypothesis  K\ 
where  at  the  disorder  time  the  transition  Ko  i— >  K\  occurs.  Define  the  function 
g  :  {0,  \}L  — *  5ft.  Page’s  procedure  for  problem  (D.l)  is: 


VR : 


'  Rn 
Nr 


max  {Rn- 1  +  g(u(n)),  0}  ,  Ro  =  0 
inf{n  |  Rn  >  h} 
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This  test  is  optimal  when  g  is  the  log-likelihood  ratio  for  testing  Kq  vs.  K\.  One 
may  recognize  that  problem  (D.l)  is  the  same  as  (5.6)  for  the  case  when  the  disorder 
occurs  at  the  beginning  of  a  block,  i.e.  p  =  M  (so  K'^  —  K{).  In  [4],  the  following 
bounds  on  ABN  are  derived: 


Ei  Nr 


E0Nr 


fr  +  7 

-  EaMu)] 
>  exp  {hu} 


(D.2) 

(D.3) 


where  7  >  0  is  a  finite  constant  and  u>o  satisfies  the  moment  generating  function 
identity  under  H0 ,  E0 [exp{in0y(u)}]  =  1.  Since  g  is  the  log-likelihood  ratio,  u>0  =  1. 
12  Since,  here,  the  disorder  is  assumed  to  occur  only  at  the  beginning  of  a  block 
(p  =  M),  the  bounds  on  ASN  are  simply  given  by  (D.2)  -  (D.3)  multiplied  by  M.  Of 
course,  we  are  actually  interested  in  the  performance  for  arbitrary  p. 

We  now  derive  ASN  bounds  that  are  applicable  for  any  p  E  {1,2,...,  M}  (i.e. 
the  problem  in  (5.6))  for  procedures  of  the  form 


Vs: 


max {£„_!  +  £(u(n)),  ^»(u(n))}  ,  S0  =  0 
inf{n  |  iSVi  >  /*■} 


where  tj;  E  9£+  is  bounded.  Vs  is  a  generic  procedure  which  includes  the  ML  optimal 
procedure  V\  and  the  suboptimal  procedure  V2  (the  latter,  by  letting  ip  =  0).  Let 
Ts  and  Ds  denote  the  expected  stopping  times  in  samples  for  Vs  under  H0  and  Hi, 
respectively. 


Proposition  5: 


Ds  <  M  1  + 


_^+7_\ 

Ei  b(*W 


(D.4) 


12Further  details  regarding  7  and  u0  are  given  in  [4].  However,  for  the  present  purposes  all 
necessary  information  is  contained  in  this  appendix. 
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Proof: 

Consider  the  test 

0,  n  =  1 

Un  = 

Vu  :  ^  [  max{i7n_i +5-(u(n)),0},  n  =  2,3,... 

Nu  —  inf{n  |  Un  >  h} 

This  procedure  differs  from  Vs  in  that:  i.)  the  first  block  of  data  (stage  n  =  1)  is 
neglected,  and  ii .)  $(■)  is  replaced  by  0.  At  first,  it  may  be  unclear  as  to  why  Vu  was 
chosen.  There  are  two  reasons  for  this  choice.  First,  by  ignoring  the  first  block  (of 
which  p  samples  are  from  Hi),  we  have  constructed  a  procedure  that  is  independent 
of  p\  this  fact,  along  with  replacing  <f>(-)  by  0,  will  enable  us  to  relate  the  expected 
stopping  time  of  Vs  to  the  bound  in  (D.2).  Second,  the  ABN  of  Vu  upper  bounds 
that  of  Vs-  To  see  this,  fix  a  particular  realization  of  {u(n)}^L1.  By  comparing  the 
test  statistics  termwise,  it  is  clear  that  Un  <  Sn  for  all  n,  and  since  this  is  true  for 
any  realization 

Ei Ns  <  E iNu  (D.5) 

That  is,  it  will  take  Un  longer  to  reach  the  boundary  h  than  it  will  Sn.  Also,  since 
Vu  is  independent  of  p, 

Ei  [Nu  |  p  =  M)  =  E iNu  (D.6) 

Now  when  p  =  M,  the  expected  sample  path  of  Un  is  the  same  as  that  of  delayed 
(shifted  to  the  right)  by  one  unit.  Thus,  the  expected  stopping  time  of  Vu  satisfies: 

E1[Nv\p  =  M]  =  l  +  E i[Nr  |  p  =  M]  (D.7) 
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Finally,  using  (5.4)  to  convert  into  units  of  samples  along  with  (D.8): 

Ds  —  Afo(0i) 

=  (E1Ns-l)M  +  p 

<  M.T±T_  +  (1 


<  M  1  + 


h  +  7 
Erb(u)] 


Proposition  6:  There  exists  some  finite  p  >  0  such  that  for  h  >  p: 

Ts>  M  exp  {(/i  —  p)w} 


(D.9) 


Proof: 


Define  the  procedure 


for  some  h  >  p  where 


Vn  =  max  {Vn-i  +  5f(u(n)),  p}  ,  V0  =  p 
Ny  =  inf{n  |  Vn  >  h} 


p=  max  i /’(u) 

ue{o,i}1' 


and  p  <  oo  since  ij)  is  bounded.  For  any  fixed  sample  realization  of 
Sn  <  Vn,  Vn.  Therefore 

Eo Ns  ^  Eq Ny  (D.10) 


Now  consider  the  procedure 


Wn  =  max{W^_i  +flt(u(n)),0}  ,  W0  =  0 

Vw  '  < 

Nw  —  inf{n  |  Sn  >  h  —  p} 

13Note  that  the  restriction  h  >  p  will  not  pose  a  problem,  since  we  will  eventually  be  taking  the 
limit  as  h  —* 


oo. 
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This  is  identical  to  procedure  Vy  except  that  the  lower  boundary  is  shifted  from,  p  to 
0,  and  the  upper  boundary  is  shifted  from  h  to  h  —  p.  It  is  not  difficult  to  see  that 
for  a  particular  sample  path  Wn  =  Vn  —  p,  Vn.  Therefore, 

BoAV  =  E0  Nw  (D.ll) 

Now  procedure  Vw  differs  from  Vr  only  in  that  the  threshold  is  shifted  by  p.  Thus, 
the  lower  bound  (D.3)  also  applies  to  the  former,  so  we  have 

E0 Nw  >  exp {(h  -  p)u}  (D.12) 

Finally,  combining  (D.10)-(D.12)  and  applying  (5.3),  we  obtain 

Ts  =  Afo(Oo) 

=  M  •  Eo  Ns 

>  M  •  EolVjy 

>  M  exp {(h  —  p)u;} 


By 


substituting  (D.4)  and  (D.9)  into  the  definition  of  r/,  we  get  the  desired  result: 

log  Ts 


u  —  lim  — — — 

h — *00  D  S 

(h  -  p)u  +  log  M 
>  lim  - r - 

=  j^^Ei  [flf(u)]  =  rj 
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Chapter  6 


An  Adaptive  Procedure  for 
Quickest  Detection 


6,1  Introduction 

In  Chapter  2,  the  problem  of  detecting  a  shift  in  the  mean  of  a  sequence  of  independent 
random  variables  from  do  to  d\  was  considered.  There,  H0  and  Hi  were  defined  as 
the  “noise  only”  and  “signal  plus  noise”  hypotheses,  respectively.  It  was  shown  that, 
when  both  do  and  di  are  known,  the  optimal  procedure  based  on  Lorden’s  criterion 
is  Page’s  test  implemented  using  the  log-likelihood  ratio. 

While  the  assumption  of  known  do  and  d\  is  convenient  for  simplifying  the  problem, 
in  many  applications  one  or  both  parameters  may  not  be  known  exactly.  In  this 
chapter,  procedures  for  detecting  a  shift  of  unknown  magnitude  in  the  mean  of  a 
random  process  are  investigated.  It  is  assumed  throughout  that  do  is  known,  but  that 
d j  is  unknown.  This  is  a  reasonable  assumption,  since  it  is  often  the  case  that  the  state 
of  a  system  is  known  or  can  be  adequately  estimated  before  the  disorder  occurs.  For 
example,  in  radar  applications,  the  return  under  the  ambient  noise  hypothesis  (that  is, 
when  no  target  is  present)  may  be  well- modelled  as  zero  mean  white  Gaussian  noise. 
However,  when  a  target  does  appear,  the  strength  of  the  return,  which  induces  a 
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proportional  change  in  the  mean  of  the  observables,  is  dependent  on  several  variables; 
these  include  the  distance  and  size  of  the  target,  as  well  as  propagation  effects  such  as 
scattering  and  multipath.  In  such  an  instance,  if  an  incorrect  value  for  Oi  were  used 
to  model  the  system,  the  actual  performance  could  deviate  greatly  from  that  which 
was  computed  based  on  the  assumed  parameter  values. 

The  actual  distribution  of  the  random  variables  depends  on  the  particular  appli¬ 
cation.  We  choose  to  focus  on  the  case  of  Poisson  observables,  although  the  basic 
techniques  can  be  used  for  other  distributions  as  well.  Disorder  problems  with  Poisson 
observables  have  potential  applications  in  many  areas.  For  example,  various  medi¬ 
cal  imaging  techniques  involve  the  generation  of  a  picture  whose  pixel  intensities  are 
proportional  to  the  number  of  photons  incident  on  the  detector.  In  many  cases,  the 
safety  of  the  patient  necessitates  keeping  the  radiation  dose  at  a  minimum,  resulting 
in  relatively  low  photon  counts;  in  this  case,  the  Gaussian  assumption  often  used  in 
image  processing  may  be  invalid.  1  A  sequential  detection  scheme  which  incorpo¬ 
rates  the  Poisson  assumption  directly  could  be  used  in  the  line-by-line  detection  of 
boundaries  in  the  image.  There  are  also  queueing  system  applications:  one  might 
wish  to  detect  changes  in  highway  traffic  flow  or  in  packet  arrival  times  at  a  server, 
both  of  which  are  often  modelled  using  the  Poisson  distribution.  Finally,  in  optical 
communications,  the  variation  in  the  arrival  rate  of  photons  at  the  receiver  could  be 
monitored. 

The  problem  is  stated  precisely  in  Section  2.  Next,  in  Section  3,  we  review  some 
of  the  commonly  used  approaches  for  detecting  jumps  of  unknown  magnitude  in 
the  mean  and  discuss  the  advantages  and  disadvantages  of  each.  In  Section  4,  we 
introduce  a  new  adaptive  procedure  for  detecting  such  a  change  with  the  following 
properties:  i)  the  procedure  is  recursive,  making  it  useful  where  an  on-line  algorithm  is 

1In  other  words,  there  may  be  an  insufficient  number  of  samples  for  the  Central  Limit  Theorem 
(leading  to  the  Gaussian  assumption)  to  be  applicable. 


Chapter  6:  An  Adaptive  Procedure  for  Quickest  Detection 


180 


desired,  and  ii)  the  performance  is  similar  to  that  of  the  optimal  Page’s  test  when  the 
true  61  is  known.  This  procedure  consists  of  two  independent  stages  which  operate 
sequentially.  The  first  stage  is  a  version  of  Page’s  test  that  is  useful  in  detecting 
jumps  of  at  least  some  minimum  known  magnitude.  The  second  stage  is  an  adaptive 
version  of  the  classical  sequential  probability  ratio  test  which  incorporates  an  on-line 
estimate  of  the  mean  of  the  observables.  In  Section  5,  the  performance  of  the  adaptive 
procedure  is  analyzed  and  computed  for  several  examples.  Due  to  the  difficulty  of 
obtaining  closed-form  analytical  expressions,  most  of  the  results  in  this  chapter  are 
based  on  Monte  Carlo  simulations.  2 


6.2  Problem  Statement 

Let  /( x  |  9)  denote  a  density  function  with  mean  9.  Let  Xi,  X2, . . .  denote  a  sequence 
of  random  variables  generated  under  the  following  hypotheses: 

H0  :  Xi  ~  /0(x)  =  f(x  |  90),  i  =  1,2, ...  ,m-  1 

Hi  :  Xi  ~  fi(x)  =  f(x  |  6i),  9X  €  0  i  =  m,m  + 1,... 

Here,  0  is  the  set  of  all  permissible  values  of  81.  Throughout,  we  will  take 

0  =  {9  |  0  >  0o  +  v0} 

where  v0  >  0  is  the  minimum  possible  jump  in  the  magnitude  of  the  mean,  which  is 
assumed  to  be  known.  3  That  is  to  say,  &i  =  do  +  v,  where  v  >  Uq.  Since  do  and  81  are 
constants,  we  see  that  the  system  undergoes  a  one-time  shift  from  9  =  60  to  9  =  61  at 
the  disorder  time  m .  The  goal  is  to  determine  the  presence  of  the  disorder  as  quickly 
2  An  earlier  version  of  this  procedure  appears  in  [5], 

3In  many  applications,  the  disorder  can  take  on  a  continuum  of  values.  In  such  cases,  uo  is  chosen 
to  be  the  minimum  change  of  interest  to  the  designer,  as  opposed  to  the  minimum  change  which  can 
occur. 
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as  possible.  In  other  words,  we  wish  to  minimize  the  expected  delay  in  detection  D 
for  a  desired  mean  time  between  false  alarms  T.  4 

As  discussed  in  Chapter  2,  when  both  90  and  9\  are  known,  the  optimal  procedure 
is  Page’s  test  implemented  using  the  log-likelihood  ratio  processor,  g(x)  =  log 
However,  it  is  often  the  case  that  one  or  both  of  9q  and  are  unknown.  In  the 
sequel,  we  investigate  the  case  where  pre-disorder  mean  90  is  known,  but  the  jump 
magnitude  v  is  unknown.  In  particular,  the  focus  will  be  on  jumps  which  occur  in 
the  rate  parameter  of  the  Poisson  distribution.  Thus, 


fo(x)  = 


(0o  r) 


x  e-eor 


x\ 


x  =  0, 13 


and 


/i(») 


(91t)x  e-6lT 


X\ 


x  =  0,  1, 


where  60  is  known  and  61  £  0.  5  For  simplicity,  we  will  let  r  =  1  throughout.  There 
are  many  applications  involving  the  Poisson  distribution  where  quickest  detection 
procedures  would  be  useful,  as  discussed  in  the  previous  section. 


6.3  Conventional  Procedures 

In  this  section,  some  established  approaches  for  detecting  jump  changes  of  unknown 
magnitude  are  outlined.  The  most  direct  approach  to  this  problem  involves  replacing 
all  of  the  unknown  quantities  with  their  respective  maximum  likelihood  (ML)  esti¬ 
mates.  For  the  present  problem,  the  unknowns  are  the  disorder  time  m  and  the  jump 
magnitude  v .  The  resulting  procedure  is  called  the  generalized  likelihood  ratio  test 
(GLRT). 

4  See  Chapter  2  for  the  precise  definitions  of  D  and  T. 

5  Notice  that  a  Poisson  random  variable  with  rate  parameter  6  has  a  mean  and  variance  both 
equal  to  Or,  Therefore,  the  jump  in  9  results  in  a  change  not  only  in  mean,  but  also  in  variance. 
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At  each  time  instant  n,  the  pair  ( k ,  v)  is  chosen  to  maximize  the  log-likelihood 
ratio 

log 


f(x  |  00  +  v) 


Sl0g  /(*  l«o) 


Define  the  test  statistic 


c  i  /(#  |  $o  ~f  0) 

bn  ~  max  max  >  log  — — — — 

B  f{x  |  8q) 


The  procedure  is  then  to  declare  a  disorder  at  time  n  in  case 


Sn>  7 

The  threshold  7  can  be  set  based  either  on  a  criteria  of  minimum  false  alarm  rate  or 
maximum  expected  delay.  The  GLRT  is  also  useful  when  the  probability  densities 
have  parametric  uncertainty.  For  example,  if  the  noise  densities  were  known  to  be 
Gauss- Gauss  mixtures 

f{x)  -  “p  {ij} +  vfe' {SJ 

an  ML  estimate  of  the  contamination  factor  e  could  also  be  incorporated  into  Sn  in 
a  straightforward  manner. 

The  GLRT  is  similar  to  Page’s  test  in  the  sense  that  the  test  statistic  Sn  is  just 
the  likelihood  ratio  of  all  samples  up  to  the  current  time  instant.  In  fact,  the  optimal 
Page’s  test  is  just  a  degenerate  version  of  the  GLRT  where  0  =  {0i}.  Unfortunately, 
the  GLRT  has  several  undesirable  properties.  First,  unlike  with  Page’s  test,  an  ex¬ 
haustive  search  must  be  performed  over  all  k  =  1, 2, . . .  ,n  and  all  possible  6  £  0.  As 
a  consequence,  the  GLRT  does  not  readily  admit  a  recursive  implementation.  Second, 
since  m  £  {1, . . .  ,n},  the  search  region  increases  linearly  with  n.  6  Third,  at  each 
time  n,  all  past  and  current  observables  . . .  ,Xn  must  be  stored. 

6In  practical  implementations,  however,  the  search  region  would  usually  be  restricted  to  some 
finite  interval.  Another  approach  might  be  to  employ  some  “intelligent”  processing  which  only 
retains  those  disorder  time  candidates  which  are  the  most  probable  based  on  the  past  data. 
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Another  approach  when  the  jump  magnitudes  are  unknown  is  to  simply  use  the 
optimal  procedure  designed  for  the  minimum  jump  v0.  That  is,  implement  Page’s 
test  using  the  processor 


g(x)  =  log 


/(x  |  d0  +  Vo) 


f(x  I  0o) 

Recall  that  Page’s  test  was  defined  using  the  test  statistic 


Sn  -  max  {Sn_i  +  g(Xn ),  0} 


where  a  disorder  is  declared  when  Sn  >  h .  Here,  we  refer  to  this  approach  as  Hinkley’s 
procedure ,  but  it  is  also  known  as  the  Page-Hinkley  test  for  the  case  when  the  noise 
is  Gaussian  (and  so  g(x )  is  linear)  [3]. 

In  considering  the  use  of  Hinkley’s  test,  care  must  be  taken  to  ensure  that  the 
log-likelihood  ratio  is  increasing  (monotonic);  such  is  the  case  with  the  Gaussian  and 
Poisson  distributions.  When  this  is  so,  observe  that 


0  <  E[sr(;c)  |  60  +  vo]  <  E[flr(z)  |  90  +  v] 

for  any  v  >  z/o.  Since  the  performance  is  proportional  to  this  expectation  (cf.  the 
definition  of  fj  in  Chapter  2),  we  see  that  the  minimum  performance  is  achieved 
under  the  minimum  jump  scenario.  Thus,  one  can  design  the  procedure  to  guarantee 
a  nominal  level  of  performance  via  the  selection  of  the  threshold  h.  In  cases  when 
the  log-likelihood  is  not  monotonic  (for  example,  with  the  Gauss-Gauss  mixture), 
Hinkley’s  test  might  be  a  poor  choice  if  the  true  jump  magnitude  could  take  on 
values  in  a  large  range.  An  alternative  would  then  be  to  use  a  suitable  monotonic 
nonparametric  nonlinearity  for  g(x ),  such  as  the  sign  detector  or  dead-zone  limiter. 
The  performance  of  Page’s  test  using  these  processors  with  known  jump  is  investigated 
in  [4];  a  similar  analysis  could  be  done  when  the  magnitudes  of  the  jumps  are  unknown. 

Since  Hinkley’s  procedure  is  a  version  of  Page’s  test,  it  is  desirable  in  that  it  can 
also  be  implemented  recursively  as  explained  in  Chapter  2.  Examples  of  applications 
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which  utilize  Hinkley’s  test  are  line-by-line  edge  detection  [1],  where  the  intensities  on 
each  side  of  the  boundaries  are  unknown,  and  the  detection  of  changes  in  the  quality 
of  links  in  communications  networks  [6],  where  deviations  in  the  nominal  probability 
of  bit  error  are  monitored. 

Another  class  of  detectors  involves  the  computation  of  the  sample  derivative  of 
the  mean  of  the  sequence  of  random  variables.  This  is  done  by  computing  a  weighted 
difference  of  a  subset  of  the  samples  before  and  after  the  hypothesized  disorder  time. 
This  gives  rise  to  so-called  filtered  derivatives  detectors. 

In  [2],  two  examples  of  this  type  of  detector  are  examined:  the  integrating  filter 
and  a  “triangular”  filter.  For  the  integrating  filter,  the  statistic 

ry  _  An+Z  An_z 

21 

is  used  to  approximate  the  derivative  of  the  mean.  When  Zn  exceeds  a  threshold,  a 
disorder  is  declared.  The  statistic 


(An_|_i  +  . . .  +  -An.)-;)  —  (An_i  +  . . .  +  An_z) 
- 


is  used  for  the  triangular  filter  in  a  similar  manner. 

It  is  shown  [2]  that  for  the  filtered  derivatives  algorithms,  both  T  and  D  are 
exponential  functions  of  the  chosen  threshold.  However,  for  Hinkley’s  test,  T  is  an 
exponential  function  of  the  threshold,  while  D  is  a  linear  function  of  the  threshold. 
A  consequence  of  this  fact  is  that  for  large  T,  the  value  of  D  will  be  much  smaller 
for  Hinkley’s  test  than  for  the  filtered  derivatives  procedures;  hence,  it  is  concluded 
in  [2]  that  Hinkley’s  test  is  superior  to  filtered  derivatives  procedures.  Furthermore, 
with  respect  to  the  GLRT,  Hinkley’s  procedure  has  the  advantage  of  being  recursive. 
Therefore,  the  performance  of  Hinkley’s  procedure  will  be  included  in  the  analysis  to 
follow. 
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6.4  An  Adaptive  Procedure 

The  discussion  of  the  previous  section  motivates  us  to  seek  out  a  procedure  with  the 
following  properties:  i)  the  procedure  is  recursive,  making  it  useful  when  an  on-line 
algorithm  is  desired,  and  ii)  the  performance  is  similar  to  that  of  the  optimal  Page’s 
test  when  the  true  is  known.  In  this  section,  a  heuristic  procedure  is  presented 
with  these  goals  in  mind. 

In  general,  for  practical  detection  and  estimation  problems,  it  is  often  the  case 
that  at  least  one  of  80  and  01  are  unknown.  A  common  approach  then  is  to  use 
estimates  of  the  parameters,  obtained  either  on-line  or  via  some  historical  data.  For 
example,  suppose  that  the  samples  are  distributed  as  either  Af(90,  cr2)  or  Af(0i,  o2)  at 
all  time  instants  (i.e.  there  is  no  disorder);  this  is  just  the  classical  hypothesis  testing 
problem.  When  0i,  z  =  0,1,  are  known,  the  sample  mean  provides  a  sufficient  statistic 
for  deciding  between  the  two  hypotheses  [8].  For  the  related  composite  hypothesis 
testing  problem  where  the  means  are  unknown  and  0o  <  —  ^i>  this  same 

procedure  can  also  be  used,  where  the  performance  does  not  fall  below  that  of  the 
case  where  0,  =  i  =  0, 1. 

For  the  disorder  problem,  however,  this  approach  is  often  not  feasible  for  the 
following  reason.  Before  the  disorder  time,  all  of  the  data  is  generated  according  to 
fo(x).  If  m  is  relatively  large,  resulting  in  a  long  wait  before  the  disorder  occurs, 
many  observables  will  be  available  to  obtain  an  estimate  of  6q.  This  is  the  case,  for 
instance,  in  radar  problems  where  the  presence  of  a  target  occurs  only  after  a  long 
period  of  time.  Even  if  m  is  small  (excluding  the  degenerate  case  where  m  =  1),  we  at 
least  know  that  some  of  the  samples  were  generated  under  fo(x).  By  comparison,  the 
estimation  of  9i  is  more  difficult  for  two  reasons:  i)  the  disorder  time  m  is  unknown, 
and  therefore  so  is  the  instant  at  which  the  observables  are  generated  from  and 

ii)  the  use  of  many  samples  from  fi(x)  in  order  to  obtain  an  accurate  estimate  for  Q\ 
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is  in  competition  with  the  desire  to  make  the  decision  as  quickly  as  possible.  As  is 
usually  the  case  with  any  sequential  detection  scheme,  the  longer  one  is  able  to  wait, 
the  more  accurate  a  decision  can  be  made. 

Based  on  the  above  discussion,  it  is  clear  that  it  would  be  unwise  to  attempt  to 
estimate  9i  based  on  all  of  the  past  observations.  Instead,  it  would  be  desirable  to 
separate  the  pre-  and  post-disorder  observables  for  the  purpose  of  estimating  90  and 
6\.  If  m  were  known,  then  this  would  be  easy  to  do;  of  course,  this  is  also  not  useful 
since  we  would  then  know  when  the  disorder  occurred,  which  is  exactly  the  problem 
in  the  first  place! 

With  this  difficulty  in  mind,  we  now  propose  an  alternative  procedure  for  detecting 
jumps  in  the  mean  of  unknown  magnitude,  maintaining  the  assumption  that  the 
magnitude  of  the  jump  is  at  least  some  minimum  value  Vq.  This  procedure  consists 
of  two  separate  tests  in  series,  which  will  be  denoted  7i  and  7^.  Test  7j  is  exactly 
the  Hinkley  test  introduced  in  Section  6.3.  Test  7^  is  a  variation  on  the  classical 
two-sided  SPRT  of  Wald  [7], 

When  using  the  SPRT  to  distinguish  between  two  hypothesized  means,  it  is  usually 
assumed  that  the  two  means  are  known  a  priori.  However,  here  the  value  of  is 
not  known,  so  we  instead  use  the  ML  estimate  based  upon  the  samples  X j+i,. .  ■  ,Xn, 
where  j  is  the  most  recent  stopping  time  of  7^  and  n  is  the  present  sample  time.  Thus, 
test  is  similar  to  an  “estimator-correlator”  in  the  sense  that  an  estimate  of  the  true 
parameter  is  correlated  with  the  present  data  via  the  log-likelihood  ratio;  therefore, 
we  call  the  resulting  procedure  the  estimator- correlator  sequential  probability  ratio 
test  (ECSPRT).  7  Let  0mm=#o  +  uo  denote  the  minimum  possible  value  of  6\.  The 
test  statistic  for  the  ECSPRT  is 


4  =  ±\og 


t=l 


fj Xj  |  P„) 
f(X i  I  «o) 


7This  procedure  is  not  an  estimator-correlator  in  the  true  sense,  but  the  idea  is  similar. 


(6.1) 
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where  the  decisions  as  to  the  occurrence  of  a  disorder  are  made  as  follows: 

t 

>  b,  decide  for  Hi 

4  \  <o,  reject  H\ 

6  (a,  b ),  compute  in+ 1 

Here  0n  =  max{0mtn,  0n}  and  9n  is  the  ML  estimate  of  9\  given  n  observations,  which 
in  this  case  is  the  sample  mean,  and  (a,  b)  is  called  the  continuation  region,  where 
a  <  0  <  b.  Note  that  9n  can  be  computed  recursively  as 

4  =  — ti  +  -4  $i  =  Xi 
n  n 

Note  also  that  as  n  — *  oo,  9n  8\  when  6  =  9i  and  6n  9min  when  0  =  0q. 

Thus,  for  large  n,  the  ECSPRT  behaves  like  the  optimal  SPRT  under  Hi,  but  like 
the  “minimum  jump”  SPRT  under  Ho. 

The  configuration  of  the  overall  adaptive  detector,  termed  the  composite  detector, 
is  shown  in  Figure  6.1.  The  purpose  of  test  T-y  is  to  signal  the  possibility  of  a  disorder. 
If  an  alarm  sounds  during  this  test,  the  second  test  is  initiated.  If  test  7^  results  in 
the  acceptance  of  Hi,  then  the  composite  test  is  terminated  and  a  disorder  is  declared. 
If  Hi  is  rejected  by  7^  the  composite  test  is  “reset”  and  7^  is  initialized  and  started 
once  again.  Thus,  a  false  alarm  occurs  in  the  composite  test  if  and  only  if  it  occurs 
in  both  Ti  and  7^. 

Let  h  >  0  denote  the  threshold  in  7^,  and  let  a  <  0  <  b  denote  the  lower  and 
upper  thresholds,  respectively,  in  7^ .  The  composite  test  is  then  given  by  the  following 
algorithm: 

1.  Initialize:  i  =  0,  So  =  0 

2.  Hinkley’s  Test: 


i  —  i  +  1 

Si  =  max{0,  Si-1  +  g{Xi)} 
if  Si  <  h,  goto  2 
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composite  test 


3.  k  =  i  +  1 

4.  ECSPRT: 

i  =  i  +  1 

c.  _  v*'  log 

-  Xj=k  ±°g  f(Xj\80) 

if  o  <  5t-  <  6,  goto  4 

if  S»  <  a,  set  Si  =  0  and  goto  2 

5.  Declare  a  disorder  at  time  i 

where  9i  is  based  on  samples  Xk , ,Xi. 

In  Section  6.3,  it  was  mentioned  that  when  the  log-likelihood  ratio  is  not  mono¬ 
tonic,  the  performance  of  Page’s  test  designed  for  the  minimum  jump  cannot  be 
guaranteed  if  the  magnitude  of  the  disorder  is  actually  larger.  It  was  suggested  that 
a  viable  alternative  would  be  to  substitute  an  appropriate  memoryless  nonlinearity 
g(x )  for  the  log- likelihood  ratio.  The  same  thing  can  be  done  in  test  which  is  also 
just  Hinkley’s  test. 
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6.5  Performance  Evaluation 

The  performance  of  the  composite  test  is  evaluated  by  computing  D  versus  log  T  for  a 
range  of  parameters,  allowing  us  to  make  direct  comparisons  with  the  optimal  Page’s 
test  and  with  Hinkley’s  test.  An  expression  for  the  average  sample  number  (ASN)  of 
the  composite  test  in  terms  of  tests  7^  and  72  is  determined.  It  is  not  clear  how  to 
obtain  explicit  expressions  for  the  performance  of  the  ECSPRT,  and  so  we  use  Monte 
Carlo  simulations  to  get  an  estimate  of  the  true  performance,  as  explained  below. 
Several  examples  that  illustrate  the  performance  of  the  composite  test  are  given. 

6.5.1  Analysis 

Denote  the  ASN  of  7^,  when  the  rate  parameter  is  8  as  Afk{8)  for  k  =  1,2,  and 
let  Af%{9)  and  Af% (9)  be  the  ASN  of  7^  given  that  the  test  terminated  at  the  lower 
and  upper  boundary,  respectively.  Tests  7^  and  72  are  independent  since  the  sets 
of  samples  that  determine  their  outcomes  are  disjoint  and  the  samples  themselves 
are  independent.  Therefore,  the  two  sub-tests  can  be  analyzed  separately  and  these 
results  can  then  be  combined  for  the  composite  test. 

Denote  the  ASN  of  the  composite  test  under  8  as  Af(8),  and  let 

cc{8 )  =  Pr{72  terminates  at  the  upper  boundary  b  \  8} 

and 

/3( 9 )  =  1  —  a(8)  =  Pr{72  terminates  at  the  lower  boundary  a  \  6} 

For  notational  convenience,  we  temporarily  drop  the  explicit  dependence  on  8.  A f 
is  the  sum  over  j  of  the  expected  time  to  cycle  though  7^  and  7^  exactly  j  times 
weighted  by  the  probability  of  exactly  j  cycles  occurring  before  the  test  terminates. 
We  have 


Af  =  (Afl  +  W2b)  •  a  +  (2Aj  +  Af*  +  Af%)  ■  a/5  + 
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(37Vi  +2A A  Nf)  ■  a/32  +  . . . 

=  a  ■  pVi  +  M- 2  +  i  (Mi 1  +  M^)  M 

t=0 

■iV,+J4+r V* 

Mi  -f-  M2 
=  !_/? 

where  we  use  the  fact  that  M2  =  flM^  +  oM$.  Therefore,  the  ASN’s  under  each 
hypothesis  can  be  obtained  by  computing 

«  =  i  =  0,l  (6.2) 

a(6i) 

N(9q)  and  N(0X)  above  are  analogous  to  the  T  and  D  of  Page’s  test,  respectively. 
Thus,  the  two  tests  can  be  compared  by  examining  D  vs.  log  T  and  J\f{0x)  vs. 
logj\f(0o)  side  by  side. 

The  form  of  the  expression  in  (6.2)  sheds  some  light  on  the  nature  of  the  composite 
test.  Observe  that  the  ASN  is  inversely  proportional  to  a(0),  the  probability  of 
crossing  the  upper  threshold  in  test  7^.  When  9  —  0Xi  the  test  statistic  in  7^  will 
have  a  positive  drift,  and  so  the  test  will  terminate  at  the  upper  boundary  with  high 
probability.  Therefore,  a(9x )  will  be  relatively  close  to  unity,  and  so  the  values  of 
J\fi(6i)  and  Af2(9i)  will  dominate  the  ASN  expression.  On  the  other  hand,  when 
9  —  90 ,  the  test  statistic  exhibits  a  negative  drift,  resulting  in  a  much  smaller  a(9o). 
In  this  case,  a(0o)  dominates  the  ASN. 

Unfortunately,  it  is  not  clear  whether  closed  form  expressions  for  ct(9)  and  ftf2 (0) 
can  be  obtained.  Therefore,  the  approach  here  is  to  use  Monte  Carlo  simulations  to 
compute  the  ASN’s.  An  unbiased  estimator  of  Af(6)  is 

R(»)  =  (6-3) 

^  k  =  1 

where  a  total  of  K  trials  are  used,  and  iV*  is  the  stopping  time  of  trial  k  when  the  rate 
parameter  is  9.  In  order  to  obtain  statistically  meaningful  results,  a  large  number 
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of  trials  must  be  performed  for  each  set  of  parameters,  since  the  variance  of  A/*(#) 
decreases  as  A.  One  problem  with  this  method,  though,  is  that  when  9  =  9q  the 
expected  stopping  time  for  each  trial  increases  exponentially  in  the  thresholds  b  and 
h,  and  so  the  time  required  to  complete  each  trial  becomes  intractably  large. 

One  way  around  this  problem  is  to  estimate  the  parameters  that  appear  in  (6.2) 
separately.  Observe  that  the  A\{9)  is  the  ASN  of  Hinkley’s  test,  a  quantity  which 
can  be  computed  using  a  number  of  techniques  as  explained  in  Chapter  2.  Thus,  this 
quantity  can  be  precomputed  for  the  desired  threshold  h.  The  remaining  parameters, 
J\f2{8)  and  a($),  can  be  determined  by  direct  Monte  Carlo  simulation.  The  advantage 
of  this  approach  is  that  only  the  simulation  of  the  ECSPRT  is  performed,  as  opposed 
to  the  entire  composite  test,  and  therefore  the  delay  associated  with  the  execution 
of  test  Tx  is  circumvented.  The  ASN  of  %,  can  be  obtained  exactly  as  in  (6.3),  while 
a{9)  can  be  approximated  by 

sw4ei{^>‘|i} 

21  k=l 


1{A}  = 


where 

/ 

1,  if  A  occurs 
0,  otherwise 

denotes  the  indicator  of  event  A  and  is  the  ECSPRT  test  statistic  for  trial  k  at 
time  sample  i.  8 


6.5.2  Comparison  of  the  Performance  of  Each  Procedure 

In  this  section,  the  composite  detector  is  compared  to  Hinkley’s  test  and  the  quickest 

detector.  Although  the  latter  is  not  realizable  in  practice  since  81  would  have  to 

be  known,  it  is  included  as  it  is  the  optimal  test  and  therefore  provides  a  useful 

8It  is  clear  that  the  frequency  of  the  event  >  b  |  0}  is  inversely  proportional  to  the  value 

of  b .  Thus,  it  is  still  possible  to  select  b  large  enough  so  that  even  the  alternative  method  is  too 
computationally  burdensome. 
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standard  for  performance  comparison.  Each  of  the  detectors  was  simulated  with 
Poisson  observables  with  normalized  rates  90  =  10  and  8min  =  11  for  several  values 
of  &i.  One  can  think  of  80  as  the  rate  under  the  “noise  only”  or  ambient  hypothesis, 
while  =  6sig  +  do  is  the  “signal  plus  noise,”  9,ig  >  9min  —  80 ■  The  signal-to-noise 
ratio  (SNR)  is  101og10  (0Jt-5/0o)-  For  example,  one  can  see  that  the  minimum  jump  of 
interest  here  is  — lOdB. 

Realizations  of  the  composite  test  under  hypotheses  Hi  (for  9i  =  15)  and  Hq  are 
shown  in  Figures  6.2  and  6.3,  respectively.  In  the  first  figure,  observe  that  a  disorder 
is  declared  only  after  both  of  the  upper  thresholds  h  (of  7j)  and  b  (of  7^)  are  crossed. 
In  this  example,  each  of  tests  T\  and  %,  are  performed  once.  In  Figure  6.3  no  disorder 
occurs  and,  as  one  would  want,  none  is  indicated  (i.e.,  the  threshold  b  is  never  crossed 
in  T2).  However,  notice  that  test  signals  an  alarm  two  times,  but  each  time  test 
7-j  quickly  rejects  the  supposition  that  a  disorder  occurred.  This  illustrates  the  “two 
step”  nature  of  the  composite  test. 

For  the  composite  test,  M{8i)  is  obtained  via  direct  Monte  Carlo  simulation,  while 
Af(60)  is  obtained  by  computing  Afi(80)  and  simulating  the  ECSPRT  parameters  as 
explained  in  Section  6.5.1.  Figure  6.4  shows  a  plot  of  a(60 )  versus  the  upper  threshold 
b  for  T2.  Observe  that  a(90)  is  an  exponential  function  of  b.  The  smallest  value,  that 
corresponding  to  b  —  13,  was  obtained  by  performing  300,000  trials,  which  produced 
15  false  alarms  and  took  several  days  to  run  on  a  Sun  4  workstation. 

Figures  6. 5-6. 7  illustrate  the  performance  of  the  composite  detector  as  compared 
to  the  optimal  Page  test  (the  “quickest  detector”)  and  Hinkley’s  test.  For  the  com¬ 
posite  test,  the  plots  were  generated  using  Monte  Carlo  simulations  as  explained  in 
Section  6.5.1.  For  the  quickest  detector  and  Hinkley’s  test,  the  Markov  approximation 
technique  discussed  in  Chapter  2,  Section  4  was  used. 

The  comparative  performance  of  the  composite  detector  when  &\  =  50  is  shown 
in  Figure  6.5;  this  corresponds  to  a  SNR  of  6  dB.  The  samples  were  obtained  by 
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Figure  6.2:  A  sample  realization  of  the  composite  detector  when  a  disorder  occurs.  Here 
0i  =  15 ,6min  —  11,  and  0q  =  10.  The  vertical  bar  indicates  the  true  disorder  time. 


Figure  6.3:  A  sample  realization  of  the  composite  detector  when  no  disorder  occurs.  Here 
Omin  =  11  and  0q  =  10. 
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Figure  6.4:  a(#o)  versus  b. 

letting  b  take  on  uniformly  spaced  values  from  1  to  13.  We  see  that  the  composite 
detector  with  h  =  6  has  both  a  greater  expected  delay  (ED)  and  mean  time  between 
false  alarms  (MFA)  than  that  for  h  =  3,  for  a  fixed  b.  This  occurs  since  a  higher 
threshold  in  test  means  more  samples  will  be  required  to  cause  an  alarm.  We 
also  note  that  the  slope  of  the  performance  curves  for  the  composite  test  and  that  of 
the  quickest  detector  are  similar.  This  results  from  the  fact  mentioned  in  Section  6.4 
that  the  ECSPRT  behaves  (asymptotically)  like  the  optimal  SPRT  under  H\  and  like 
the  “minimum  jump  SPRT”  under  H0.  From  these  findings,  one  might  be  tempted 
to  make  h  ~  0  (i.e.,  do  away  with  completely).  However,  this  would  increase 
the  likelihood  that  the  disorder  will  occur  during  test  7^,  which  also  increases  the 
likelihood  that  9n  will  be  based  on  samples  under  both  /0  and  fi,  an  undesirable 
condition  as  discussed  in  Section  6.4.  In  the  present  analysis,  the  MFA  and  ED  are 
computed  assuming  that  the  disorder  occurs  when  test  is  active,  a  good  assumption 
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when  h  is  not  close  to  zero.  The  more  general  case  where  the  disorder  may  occur 
when  either  test  is  active  is  left  for  future  study. 

Figure  6.6  illustrates  the  case  where  —  20,  for  an  SNR  of  0  dB.  For  both  this 
case  and  that  of  Figure  6.5,  observe  that  the  composite  test  outperforms  Hinkley’s 
test  for  higher  MFA;  however,  the  particular  MFA  at  which  this  occurs  depends  upon 
the  choice  of  parameters  and  0i.  This  fact  suggests  that  the  decision  to  use  the 
composite  test  or  Hinkley’s  test  depends  on  the  desired  MFA.  Again  notice  that  the 
slope  of  the  composite  performance  curve  is  approximately  the  same  as  that  of  the 
quickest  detector,  but  that  there  is  an  offset  of  a  few  samples  between  the  two  curves. 
This  offset  arises  from  two  factors.  First,  the  minimum  number  of  samples  required 
for  the  composite  test  is  two,  instead  of  only  one  for  the  quickest  detector  since  the 
former  is  composed  of  two  tests.  Second,  an  additional  delay  is  incurred  in  the  former 
due  to  the  additional  time  required  for  6n  to  converge  “close  enough”  to  to  allow 
the  ECSPRT  to  react  in  a  similar  manner  as  the  quickest  detector. 

Finally,  Figure  6.7  shows  the  case  where  =  11,  for  a  SNR  of  —10  dB.  This 
is  the  case  where  the  true  jump  is  that  of  the  minimum  assumed  magnitude.  Here, 
the  quickest  detector  and  Hinkley  procedure  are  the  same  test,  and  so  only  one 
performance  curve  is  shown  for  both.  The  performance  curves  of  the  composite  test 
are  similar  to  that  of  the  quickest  detector.  In  particular,  the  slopes  of  the  former 
appear  to  be  only  slightly  greater  than  the  latter.  This  is  not  surprising,  since  the 
optimal  performance  is  achieved  by  using  the  quickest  detector.  We  can  also  observe 
that  the  offset  between  the  two  curves  becomes  less  critical  in  that  the  percentage 
increase  in  ED  of  the  composite  test  over  the  quickest  detector  is  smaller  for  smaller 
01. 
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Figure  6.5:  Performance  of  the  quickest,  Hinkley,  and  composite  detectors  for  6\  —  50 
(SNR  =  6  dB).  For  the  composite  detector,  a  =  —5,  h  —  3  (0?s)  an^  6  (xJs). 


Figure  6.6:  Performance  of  the  quickest,  Hinkley,  and  composite  detectors  for  —  20 
(SNR  =  0  dB).  For  the  composite  detector,  a  =  —5,  h  =  3  (Q’s)  and  6  (xJs). 
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Figure  6.7:  Performance  of  the  quickest /Hinkley  and  composite  detectors  for  9 
(SNR  =  —10  dB).  For  the  composite  detector,  a  =  —5,  h—  3  (QJs)  6  (xJs). 
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6.6  Conclusions 

In  this  chapter,  an  adaptive  approach  for  detecting  a  jump  change  of  unknown  mag¬ 
nitude  in  the  parameter  of  a  random  process  was  introduced.  Our  approach,  termed 
the  composite  test,  is  a  heuristic  procedure  created  with  the  goal  of  retaining  the  ben¬ 
eficial  properties  of  the  optimal  Page’s  test,  the  quickest  detector,  even  though  the 
jump  magnitude  is  not  known.  Our  examples  focused  on  the  case  where  the  disorder 
is  a  jump  in  the  rate  parameter  of  a  Poisson  process,  but,  as  discussed  in  Section  6.4, 
the  procedure  can  be  applied  to  other  processes  as  well. 

In  Section  6.5,  the  composite  test  was  analyzed.  It  was  shown  that  the  perfor¬ 
mance  can  be  expressed  in  terms  of  the  performance  of  each  of  the  sub-tests  7j  and 
T2.  Because  closed  form  expressions  for  the  ECSPRT  were  not  available,  Monte  Carlo 
simulations  were  used  instead. 

The  performance  of  the  composite  test  relative  to  the  quickest  detector  and  Hink- 
ley’s  test,  which  is  Page’s  test  designed  for  the  jump  of  minimum  magnitude,  was 
evaluated  using  several  examples.  It  was  shown  that  the  composite  test  outperforms 
Hinkley’s  test  for  higher  mean  time  between  false  alarms.  It  also  exhibits  performance 
that  is  similar  to  the  quickest  detector  in  the  sense  that  the  slopes  of  the  composite 
and  quickest  detectors  are  similar;  therefore,  the  performance  of  these  procedures 
will  be  asymptotically  similar,  differing  by  a  bias  which  is  proportional  to  the  true 
jump  magnitude.  This  work  shows  that  the  composite  test  is  a  viable  procedure  for 
detecting  jumps  in  the  mean  of  unknown  magnitude. 

There  are  several  directions  for  future  work.  First,  it  would  not  be  difficult  to 
evaluate  the  performance  of  the  composite  test  for  other  distributions,  and  also  for 
the  case  where  the  samples  are  correlated.  Second,  it  would  be  nice  to  obtain  closed 
form  expressions  for  the  performance  of  the  ECSPRT,  which  would  allow  us  to  express 
the  performance  of  the  composite  test  without  the  use  of  simulations.  Finally,  it  would 
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be  useful  to  develop  an  extension  of  the  procedure  to  the  multivariate  case. 
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Conclusions 


This  thesis  investigated  quickest  detection  procedures  for  several  types  of  disorder 
problems.  The  objective  throughout  was  to  minimize  the  expected  delay  in  detecting 
the  disorder,  subject  to  a  lower  bound  on  the  mean  time  between  false  alarms.  The 
contributions  of  this  thesis  are  given  below. 

7.1  Contributions 

Chapter  2  provided  an  overview  of  the  foundations  of  work  on  quickest  detection,  and 
served  as  a  prerequisite  for  the  rest  of  the  thesis.  The  asymptotic  performance  measure 
(APM)  was  shown  to  be  a  useful  figure  of  merit,  in  that  it  allows  us  to  characterize 
the  performance  of  a  detection  procedure  using  a  single  number.  Consequently,  the 
asymptotically  optimal  procedure  can  be  determined  via  the  maximization  of  the 
APM.  Since  the  computation  of  the  APM  is  not  always  feasible,  a  lower  bound  that 
approximates  the  APM  was  used  in  the  design  process.  It  was  shown  that  the  log- 
likelihood  ratio  is  necessary  and  sufficient  to  maximize  this  bound.  Several  techniques 
for  computing  the  performance  of  procedures  to  detect  disorders  were  also  given. 

Quickest  detection  procedures  when  underlying  noise  models  were  partially  un¬ 
known  were  considered  in  Chapter  3.  Specifically,  the  £-cont  animation  and  total 
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variation  uncertainty  classes  were  studied.  The  minimax  robust  quickest  detector  was 
derived  by  applying  the  minimax  criterion  directly  to  the  APM,  and  it  was  shown 
that  a  saddlepoint  solution  exists  for  this  problem.  The  minimax  APM  was  shown  to 
equal  the  Kullback-Leibler  (K-L)  divergence,  and  so  the  least  favorable  distributions 
are  those  which  minimize  this  quantity;  the  robust  processor  is  then  the  log-likelihood 
ratio  of  the  least  favorable  densities.  Performance  curves  are  generated  which  show 
that  the  robust  procedure  works  well  for  a  number  of  members  of  the  uncertainty 
class,  and  in  all  but  a  few  cases  outperforms  nonparametric  techniques  based  on  the 
linear,  sign,  and  dead-zone  nonlinearities.  For  the  weak  signal  case,  we  established  an 
equivalence  between  the  APM,  the  classical  efficacy,  and  Fisher’s  information.  The 
weak  signal  robust  detector  was  obtained  by  finding  the  least  favorable  distribution 
for  Fisher’s  information.  Performance  curves  were  given  to  show  the  gains  available 
when  robustness  is  built  into  the  detection  procedure. 

The  investigation  of  robust  quickest  detection  procedures  was  continued  in  Chap¬ 
ter  4  for  the  case  where  the  mean  and/or  noise  covariance  of  a  multivariate  Gaussian 
process  is  uncertain.  As  in  the  previous  chapter,  the  robust  procedure  is  obtained  by 
applying  the  minimax  criterion  to  the  APM.  It  is  shown  that  the  robust  processor  is 
exactly  the  robust  discrete-time  matched  filter.  Particular  solutions  were  presented 
for  several  different  uncertainty  classes,  each  of  which  is  based  on  the  deviation  from 
some  nominal  parameter  set.  Some  performance  curves  were  given  which  illustrate 
the  tradeoffs  when  there  is  a  mismatch  between  the  assumed  and  actual  levels  of 
uncertainty.  The  applicability  of  the  robust  procedure  to  non-Gaussian  multivariate 
processes  was  also  discussed. 

In  Chapter  5,  we  examined  the  problem  of  designing  quickest  detection  procedures 
at  the  fusion  center  of  a  distributed  detection  system.  An  optimal  procedure  was  de¬ 
rived  and  compared  to  several  alternative  methods  which  are  easier  to  implement  in 
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that  they  are  recursive  and  require  less  computation.  It  was  shown  that  a  slight  mod¬ 
ification  of  the  optimal  scheme  leads  to  a  suboptimal  procedure  whose  performance 
differs  negligibly  from  the  optimal.  A  simple  method  for  choosing  the  thresholds  of 
the  local  detectors  was  presented;  specifically,  the  thresholds  were  selected  to  max¬ 
imize  the  asymptotic  performance  measure  of  the  distributed  system.  A  sensitivity 
analysis  revealed  that  the  method  results  in  overall  system  performance  which  is  close 
to  optimal,  even  for  small  mean  times  between  false  alarms.  Lastly,  the  relationship 
between  channel  bandwidth  and  detection  delay  was  evaluated.  It  was  shown  that 
the  optimum  bandwidth  is  a  function  of  signal  strength.  Perhaps  contrary  to  in¬ 
tuition,  for  weaker  signals,  the  optimal  performance  did  not  result  from  the  system 
with  the  maximum  bandwidth.  Performance  curves  were  presented  which  illustrate 
the  performance  gain  or  loss  as  the  bandwidth  varies;  these  curves  would  be  useful  to 
a  designer  who  must  make  decisions  based  on  the  tradeoff  between  bandwidth  cost 
of  bandwidth  and  system  performance. 

Finally,  the  disorder  problem  when  a  jump  of  unknown  magnitude  occurs  in  the 
mean  of  a  random  process  was  investigated  in  Chapter  6.  Optimal  methods  exist  when 
the  jump  magnitude  is  known  (Page’s  test  using  the  log-likelihood  ratio);  we  proposed 
an  adaptive  procedure  suitable  when  this  information  is  not  available.  The  procedure 
consisted  of  two  tests  which  operate  in  series.  The  first  test  signals  a  candidate 
disorder  time,  which  the  second  test  then  uses  to  form  an  estimate  of  the  post- 
disorder  mean  value.  This  estimate  was  then  incorporated  into  an  adaptive  version  of 
the  well-known  sequential  probability  ratio  test.  The  average  sample  number  (ASN) 
of  the  adaptive  procedure  was  derived  and  expressed  in  terms  of  the  two  sub-tests.  It 
was  shown  via  simulation  that  the  adaptive  test  has  similar  asymptotic  performance 
to  the  test  which  is  optimal  for  known  jump  size.  The  procedure  was  implemented 
to  detect  a  change  in  the  rate  parameter  of  a  Poisson  process;  however,  it  is  also 
applicable  to  other  distributions. 


